计算机科学
联营
分类器(UML)
语音识别
人工智能
欺骗攻击
利用
模式识别(心理学)
融合
机器学习
语言学
哲学
计算机网络
计算机安全
作者
Yinlin Guo,Haofan Huang,Xi Chen,He Zhao,Yuehai Wang
标识
DOI:10.1109/icassp48485.2024.10447923
摘要
With the rapid development of speech synthesis and voice conversion technologies, Audio Deepfake has become a serious threat to the Automatic Speaker Verification (ASV) system. Numerous countermeasures are proposed to detect this type of attack. In this paper, we report our efforts to combine the self-supervised WavLM model and Multi-Fusion Attentive classifier for audio deepfake detection. Our method exploits the WavLM model to extract features that are more conducive to spoofing detection for the first time. Then, we propose a novel Multi-Fusion Attentive (MFA) classifier based on the Attentive Statistics Pooling (ASP) layer. The MFA captures the complementary information of audio features at both time and layer levels. Experiments demonstrate that our methods achieve state-of-the-art results on the ASVspoof 2021 DF set and provide competitive results on the ASVspoof 2019 and 2021 LA set.
科研通智能强力驱动
Strongly Powered by AbleSci AI