计算机科学
特征(语言学)
建筑
人工神经网络
保险丝(电气)
人工智能
嵌入
网络体系结构
方案(数学)
计算
深度学习
模式识别(心理学)
语音识别
算法
计算机网络
工程类
数学分析
哲学
电气工程
艺术
语言学
视觉艺术
数学
作者
Bei Liu,Zhengyang Chen,Qian Ye
出处
期刊:IEEE/ACM transactions on audio, speech, and language processing
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:31: 1825-1838
标识
DOI:10.1109/taslp.2023.3273417
摘要
Deep speaker embedding learning based on neural networks has become the predominant approach in speaker verification (SV) currently. In prior studies, researchers have investigated various network architectures. However, rare works pay attention to the question of how to design and scale up networks in a principled way to achieve a better trade-off on model performance and computational complexity. In this paper, we focus on efficient architecture design for speaker verification. Firstly, we systematically study the effect of the network depth and width on performance and empirically discover that depth is more important than the width of networks for speaker verification task . Based on this observation, we propose a novel depth-first (DF) architecture design rule. By applying it to ResNet and ECAPA-TDNN, two new families of much deeper models, namely DF-ResNets and DF-ECAPAs, are constructed. In addition, to further boost the performance of small models in the low computation regime, a novel attentive feature fusion (AFF) scheme is proposed to replace the conventional feature fusion methods. Specifically, we design two different fusion strategies, including sequential AFF (S-AFF) and parallel AFF (P-AFF), which can dynamically fuse features in a learnable way. Experimental results on the VoxCeleb dataset show that the newly proposed DF-ResNets and DF-ECAPAs can achieve a much better trade-off on performance and complexity than the original ResNet and ECAPA-TDNN. Moreover, small models can further obtain up to 40% relative improvement in EER by adopting AFF scheme with negligible computational cost. Finally, a comprehensive comparison with various other published SV systems illustrates that our proposed models achieve the best trade-off on performance and complexity in both low and high computation scenarios.
科研通智能强力驱动
Strongly Powered by AbleSci AI