卷积(计算机科学)
计算机科学
特征(语言学)
比例(比率)
重叠-添加方法
语音识别
人工智能
模式识别(心理学)
数学
傅里叶变换
物理
数学分析
哲学
分数阶傅立叶变换
量子力学
傅里叶分析
人工神经网络
语言学
作者
Haochen Wu,Jie Zhang,Zhentao Zhang,Li Wang,Bin Gu,Wu Guo
标识
DOI:10.1109/icassp48485.2024.10446612
摘要
Spoof speech detection (SSD) can help to protect an automatic speaker recognition system against malicious attacks. However, there exists a great diversity in the spoof utterances generated by different text-to-speech and voice conversion algorithms, resulting in a poor generality of an SSD system to unseen spoofing attacks. To address this problem, we integrate multi-scale feature aggregation (MFA) and dynamic convolution operations into the anti-spoofing framework to detect different local and global artifacts of unseen spoofing attacks. The proposed framework mainly contains eight stacked MFA blocks, where in each block the light-Res2Net module is used to capture multi-scale features and the convolutional kernel is dynamically generated by the local and global statistical information of the inputs. Results on two benchmark datasets (i.e., ADD 2023 Fake Audio Detection and ASVspoof 2021 Logical Access) show the superiority of the proposed method over existing state-of-the-art systems.
科研通智能强力驱动
Strongly Powered by AbleSci AI