计算机科学
帧(网络)
人工智能
语音识别
计算机网络
作者
Fenting Liu,Feifei Xiong,Yiya Hao,Kechenying Zhou,Chenhui Zhang,Jinwei Feng
标识
DOI:10.1109/icassp48485.2024.10446581
摘要
We present a lightweight neural network with attentive score loss for frame-wise personalized voice activity detection (i.e., AS-pVAD). Instead of using an external speaker embedding extractor with a large number of parameters, AS-pVAD employs a lightweight internal model to extract the target speaker embedding. A novel attentive score loss constraint is proposed to better exploit such embedding clues for pVAD compared to conventional embedding concatenation. Through joint training with a regular VAD, AS-pVAD can be further improved to identify the target speaker in the enrollment cases while it is able to function as a regular VAD in the enrollment-less cases. Experimental results show that AS-pVAD achieves over 0.9 of AUCROC on average in two-speaker talking scenario under various noisy and reverberant environments. Our test set is also publicly released to the community to facilitate the research in this area.
科研通智能强力驱动
Strongly Powered by AbleSci AI