计算机科学
自然语言处理
情绪检测
人工智能
情绪识别
语音识别
作者
Emanuele Conti,Davide Salvi,Clara Borrelli,Brian Hosler,Paolo Bestagini,Fabio Antonacci,Augusto Sarti,Matthew C. Stamm,Stefano Tubaro
标识
DOI:10.1109/icassp43922.2022.9747186
摘要
In recent years, audio and video deepfake technology has advanced relentlessly, severely impacting people's reputation and reliability. Several factors have facilitated the growing deepfake threat. On the one hand, the hyper-connected society of social and mass media enables the spread of multimedia content worldwide in real-time, facilitating the dissemination of counterfeit material. On the other hand, neural network-based techniques have made deepfakes easier to produce and difficult to detect, showing that the analysis of low-level features is no longer sufficient for the task. This situation makes it crucial to design systems that allow detecting deepfakes at both video and audio levels. In this paper, we propose a new audio spoofing detection system leveraging emotional features. The rationale behind the proposed method is that audio deepfake techniques cannot correctly synthesize natural emotional behavior. Therefore, we feed our deepfake detector with high-level features obtained from a state-of-the-art Speech Emotion Recognition (SER) system. As the used descriptors capture semantic audio information, the proposed system proves robust in cross-dataset scenarios outperforming the considered baseline on multiple datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI