计算机科学
概化理论
过度拟合
特征(语言学)
一般化
特征向量
人工智能
领域(数学分析)
外推法
卷积神经网络
模式识别(心理学)
语音识别
骨料(复合)
频域
机器学习
人工神经网络
数学
数学分析
语言学
统计
哲学
计算机视觉
材料科学
复合材料
作者
Yuankun Xie,Haonan Cheng,Yutian Wang,Long Ye
标识
DOI:10.1109/tifs.2023.3324724
摘要
In this paper, we propose an Aggregation and Separation Domain Generalization (ASDG) method for Audio DeepFake Detection (ADD). Fake speech generated from different methods exhibits varied amplitude and frequency distributions rather than genuine speech. In addition, the spoofing attacks in training sets may not keep pace with the evolving diversity of real-world deepfake distributions. In light of this, we attempt to learn an ideal feature space that can aggregate real speech and separate fake speech to achieve better generalizability in the detection of unseen target domains. Specifically, we first propose a feature generator based on Lightweight Convolutional Neural Networks (LCNN), which is employed for generating a feature space and categorizing the feature into real and fake. Meanwhile, single-side domain adversarial learning is leveraged to make only the real speech from different domains indistinguishable, which enables the distribution of real speech to be aggregated in the feature space. Furthermore, a triplet loss is adopted to separate the distribution of fake speech while aggregating the distribution of real speech. Finally, in order to test the generalizability of the model, we train it with three different English datasets and evaluate in harsh conditions: cross-language and noisy datasets. The extensive experiments show that ASDG outperforms the baseline models in cross-domain tasks and decreases Equal Error Rate (EER) by up to 39.24% when compared to that of RawNet2. It is proved that the proposed Aggregation and Separation Domain Generalization method can be an effective strategy to improve the model generalizability.
科研通智能强力驱动
Strongly Powered by AbleSci AI