计算机科学
光谱图
语音识别
音频信号
音频信号处理
缩放比例
杠杆(统计)
信号(编程语言)
录音和复制
语音编码
人工智能
声学
几何学
数学
物理
程序设计语言
作者
Michele Pilia,Sara Mandelli,Paolo Bestagini,Stefano Tubaro
标识
DOI:10.1109/wifs53200.2021.9648389
摘要
The widespread diffusion of user friendly editing software for audio signals has made audio tampering extremely accessible to anyone. Therefore, it is increasingly necessary to develop forensic methodologies aiming at verifying if a given audio content has been digitally manipulated or not. Among the multiple available audio editing techniques, a very common one is time scaling, i.e., altering the temporal evolution of an audio signal without affecting any pitch component. For instance, this can be used to slow-down or speed-up speech recordings, thus enabling the creation of natural sounding fake speech compositions. In this work, we propose to blindly detect and estimate the time scaling applied to an audio signal. To expose time scaling, we leverage a Convolutional Neural Network that analyzes the Log-Mel Spectrogram and the phase of the Short Time Fourier Transform of the input audio signal. The proposed technique is tested on different audio datasets, considering various time scaling implementations and challenging cross test scenarios.
科研通智能强力驱动
Strongly Powered by AbleSci AI