Monaural Speech Dereverberation Using Deformable Convolutional Networks

计算机科学语音识别语音增强可理解性（哲学）光谱图单声道混响语音处理声道人工智能声学降噪认识论物理哲学

作者

Vinay Kothapally,John H. L. Hansen

出处

期刊：IEEE/ACM transactions on audio, speech, and language processing [Institute of Electrical and Electronics Engineers]
日期：2024-01-01 卷期号：32: 1712-1723 被引量：2

标识

DOI：10.1109/taslp.2024.3358720

摘要

Reverberation and background noise can degrade speech quality and intelligibility when captured by a distant microphone. In recent years, researchers have developed several deep learning (DL)-based single-channel speech dereverberation systems that aim to minimize distortions introduced into speech captured in naturalistic environments. A majority of these DL-based systems enhance an unseen distorted speech signal by applying a predetermined set of weights to regions of the speech spectrogram, regardless of the degree of distortion within the respective regions. Such a system might not be an ideal solution for dereverberation task. To address this, we present a DL-based end-to-end single-channel speech dereverberation system that uses deformable convolution networks (DCN) that dynamically adjusts its receptive field based on the degree of distortions within an unseen speech signal. The proposed system includes the following components to simultaneously enhance the magnitude and phase responses of speech, which leads to improved perceptual quality: (i) a complex spectrum enhancement module that uses multi-frame filtering technique to implicitly correct the phase response, (ii) a magnitude enhancement module that suppresses dominant reflections and recovers the formant structure using deep filtering (DF) technique, and (iii) a speech activity detection (SAD) estimation module that predicts frame-wise speech activity to suppress residuals in non-speech regions. We assess the performance of the proposed system by employing objective speech quality metrics on both simulated and real speech recordings from the REVERB challenge corpus. The experimental results demonstrate the benefits of using DCNs and multi-frame filtering for speech dereverberation task. We compare the performance of our proposed system against other signal processing (SP) and DL-based systems and observe that it consistently outperforms other approaches across all speech quality metrics.

求助该文献

最长约 10秒，即可获得该文献文件

Monaural Speech Dereverberation Using Deformable Convolutional Networks

今日热心研友