立体声录音
计算机科学
编码器
频道(广播)
语音识别
不变(物理)
感知
人工智能
数学
电信
心理学
神经科学
数学物理
操作系统
作者
Rui Liu,Jinhua Zhang,Guanglai Gao
标识
DOI:10.1016/j.inffus.2024.102257
摘要
Audio deepfake detection (ADD) aims to detect the fake audio generated by text-to-speech (TTS), and voice conversion (VC), etc., which is an emerging topic. Traditionally we read the mono signal and analyze the artifacts directly. Recently, the mono-to-binaural conversion based ADD approach has attracted increasing attention since the binaural audio signals provide a unique and comprehensive perspective on speech perception. Such method attempts tried to first convert the mono audio into binaural, then process the left and right channels respectively to discover authenticity cues. However, the acoustic information from the two channels exhibits both differences and similarities, which have not been thoroughly explored in previous research. To address this issue, we propose a new mono-to-binaural conversion based ADD framework that considers multi-space channel representation learning, termed "MSCR-ADD". Specifically, (1) the feature representations of the respective channels are learned by the channel-specific encoder and stored in the channel-specific space; (2) the feature representations capturing the difference between the two channels are learned by the channel-differential encoder and stored in the channel-differential space; (3) after which the channel-invariant encoder learn the channel commonality representations in the channel-invariant space. Note that we propose orthogonal and mutual information maximization losses to constrain the channel-specific and invariant encoders. At last, three representations from various spaces are mixed together to finalize the deepfake detection. It is worth noting that the feature representations in the channel-differential and invariant spaces unveil the differences and similarities between the two channels in binaural audio, enabling us to effectively detect artifacts in fake audio. The experimental results on four benchmark datasets demonstrate that our MSCR-ADD is superior to existing state-of-the-art approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI