Improving Monaural Speech Enhancement with Dynamic Scene Perception Module
单声道
计算机科学
语音增强
语音识别
感知
人工智能
心理学
降噪
神经科学
作者
Tian Lan,Jiajia Li,Wenxin Tai,Cong Chen,J. KANG,Qiao Liu
标识
DOI:10.1109/icme52920.2022.9858924
摘要
Speech enhancement aims to recover clean speech from complex noise backgrounds. This paper proposes a novel information processing module dubbed dynamic scene perception module (DSPM) that can help existing systems to accommodate various complex scenarios. The inspiration of DSPM is based on the observation that different regions of the noisy spectrum in different scenarios have different enhancing requirements. Concretely, DSPM consists of two parts, one for dynamic scene estimation, and the other for adaptive region perception. In particular, the scene estimator utilizes a spectrum-energy-based attention mechanism to obtain the coefficients of each convolution kernel. Then, at each position' the region perceptron chooses the corresponding kernels by considering the requirements of the current region (preserve vocals or suppress noise). Systematic evaluations on the TIMIT corpus and Voice Bank + DEMAND demonstrate the effectiveness of our method. Compared with the existing systems, our proposed method achieved better performance under various SNR conditions and complex noise scenarios.