Softmax函数
计算机科学
语音识别
判别式
欺骗攻击
分类器(UML)
模式识别(心理学)
人工智能
特征提取
光学(聚焦)
人工神经网络
计算机网络
物理
光学
作者
Chen Chen,Yaozu Song,Bohan Dai,Deyun Chen
标识
DOI:10.1016/j.neucom.2023.126799
摘要
Automatic speaker verification (ASV) systems are highly vulnerable to synthetic speech attack. And the artifacts are the key spoofing clue to distinguish real and synthetic speech. In this paper, we focus on the detection of artifacts and proposed the twice attention networks (TA-networks). It is an end-to-end network which consists of feature extraction module and back-end classifier. The feature extraction module is the core of the TA-networks, and it is a twice attention Unet (TA-Unet). It contains two sequential attention modules: (1) a five-layer U-shaped network with attention gate to first obtain the general contour of artifacts and then (2) a softmax-based filter with adaptive coefficient to dynamically highlight the differences between different frequencies, and these differences can be regarded as elaborate artifacts. After the processing of the TA-Unet, the feature maps of real and synthetic speech are more discriminative for the back-end SCG-Res2Net50 classifier. Experimental results show that the TA-networks achieve equal error rates of 1.62% on ASVspoof 2019 logical access sub-challenge, and it is significantly better than most of the other experimental models.
科研通智能强力驱动
Strongly Powered by AbleSci AI