光谱图
计算机科学
判别式
语音识别
自回归模型
降噪
人工智能
推论
模式识别(心理学)
信号处理
适应(眼睛)
语音处理
语音增强
机器学习
数字信号处理
数学
光学
物理
计量经济学
计算机硬件
作者
Dario Rethage,Jordi Pons,Xavier Serra
标识
DOI:10.1109/icassp.2018.8462417
摘要
Most speech processing techniques use magnitude spectrograms as front-end and are therefore by default discarding part of the signal: the phase. In order to overcome this limitation' we propose an end-to-end learning method for speech denoising based on Wavenet. The proposed model adaptation retains Wavenet's powerful acoustic modeling capabilities, while significantly reducing its time-complexity by eliminating its autoregressive nature. Specifically, the model makes use of non-causal, dilated convolutions and predicts target fields instead of a single target sample. The discriminative adaptation of the model we propose, learns in a supervised fashion via minimizing a regression loss. These modifications make the model highly parallelizable during both training and inference. Both quantitative and qualitative evaluations indicate that the proposed method is preferred over Wiener filtering, a common method based on processing the magnitude spectrogram.
科研通智能强力驱动
Strongly Powered by AbleSci AI