计算机科学
Softmax函数
欺骗攻击
过度拟合
人工智能
联营
机器学习
特征(语言学)
人工神经网络
辍学(神经网络)
模式识别(心理学)
语音识别
计算机网络
语言学
哲学
作者
Guoyuan Lin,Weiqi Luo,Da Luo,Jiwu Huang
标识
DOI:10.1109/tifs.2024.3352429
摘要
Existing deep learning models for spoofing speech detection often struggle to effectively generalize to unseen spoofing attacks that were not present during the training stage. Moreover, the presence of class imbalance further compounds this issue by biasing the learning process towards seen attack samples. To address these challenges, we present an innovative end-to-end model called One-Class Neural Network with Directed Statistics Pooling (OCNet-DSP). Our model incorporates a feature cropping operation to attenuate high-frequency components, mitigating the risk of overfitting. Additionally, leveraging the time-frequency characteristics of speech signals, we introduce a directed statistics pooling layer that extracts more effective features for distinguishing between bonafide and spoofing classes. We also propose the Threshold One-class Softmax loss, which mitigates class imbalance by reducing the optimization weight of spoofing samples during training. Extensive comparative results demonstrate that the proposed model outperforms all existing single models, achieving an equal error rate of 0.44% and a minimum detection cost function of 0.0145 for the ASVspoof 2019 logical access database. Moreover, the proposed ensemble version, which accommodates speech inputs of varying lengths in each submodel, maintains state-of-the-art performance among reproducible ensemble models. Additionally, numerous ablation experiments, along with a cross-dataset experiment, are conducted to validate the rationality and effectiveness of the proposed model.
科研通智能强力驱动
Strongly Powered by AbleSci AI