麦克风阵列
计算机科学
波束赋形
语音识别
频道(广播)
失真(音乐)
话筒
模式识别(心理学)
语音增强
信号(编程语言)
算法
人工智能
电信
放大器
降噪
声压
带宽(计算)
程序设计语言
作者
Zhong-Qiu Wang,Peidong Wang,DeLiang Wang
标识
DOI:10.1109/taslp.2020.2998279
摘要
This study proposes a complex spectral mapping approach for single- and multi-channel speech enhancement, where deep neural networks (DNNs) are used to predict the real and imaginary (RI) components of the direct-path signal from noisy and reverberant ones. The proposed system contains two DNNs. The first one performs single-channel complex spectral mapping. The estimated complex spectra are used to compute a minimum variance distortion-less response (MVDR) beamformer. The RI components of beamforming results, which encode spatial information, are then combined with the RI components of the mixture to train the second DNN for multi-channel complex spectral mapping. With estimated complex spectra, we also propose a novel method of time-varying beamforming. State-of-the-Art performance is obtained on the speech enhancement and recognition tasks of the CHiME-4 corpus. More specifically, our system obtains 6.82%, 3.19% and 1.99% word error rates (WER) respectively on the single-, two-, and six-microphone tasks of CHiME-4, significantly surpassing the current best results of 9.15%, 3.91% and 2.24% WER.
科研通智能强力驱动
Strongly Powered by AbleSci AI