计算机科学
语音识别
人工神经网络
说话人识别
水准点(测量)
判别式
背景(考古学)
时滞神经网络
模式识别(心理学)
字错误率
推论
人工智能
古生物学
大地测量学
生物
地理
作者
Yuhang Sun,Chenxing Li,Biao Li
标识
DOI:10.1109/icassp48485.2024.10448107
摘要
Current speaker verification techniques heavily rely on the utilization of neural networks to extract accurate and discriminative speaker representations. In this paper, we present Branchformer based TDNN (B-TDNN), a novel architecture for extracting speaker embeddings by capturing both global and local context within each computing unit. The proposed B-TDNN combines the branchformer and traditional TDNN architecture to effectively capture contextual information. Additionally, our research demonstrates the validity of the smaller model, emphasizing its capability to attain exceptional results even with fewer parameters. To further enhance the efficiency of the model, a Branch Auxiliary Training (BAT) method is introduced, that is, jointly training two branches while using only the more critical branch during inference. The BAT method competently decreases the parameter count of the model while ensuring that the performance remains uncompromised. Experimental results showcase B-TDNN sets a new benchmark in speaker verification performance, delivering state-of-the-art results with an impressive Equal Error Rate (EER) of 0.66% on the VoxCeleb1 trial file.
科研通智能强力驱动
Strongly Powered by AbleSci AI