DeepResGRU: Residual gated recurrent neural network-augmented Kalman filtering for speech enhancement and recognition

计算机科学语音识别可理解性（哲学）字错误率自回归模型残余物卡尔曼滤波器语音增强噪音（视频）降噪人工智能算法数学统计认识论图像（数学）哲学

作者

Nasir Saleem,Jiechao Gao,Muhammad Irfan Khattak,Hafiz Tayyab Rauf,Seifedine Kadry,Muhammad Shafi

出处

期刊：Knowledge Based Systems [Elsevier]
日期：2021-12-11 卷期号：238: 107914-107914 被引量：44

标识

DOI：10.1016/j.knosys.2021.107914

摘要

With the recent research developments, deep learning models are powerful alternatives for speech enhancement and recognition in many real-world applications. Although state-of-the-art models achieve phenomenal results in terms of the background noise reduction, but the challenge is to design robust models for improving the quality, intelligibility, and word error rate. We propose a novel residual connection-based Bidirectional Gated Recurrent Unit (BiGRU) augmented Kalman filtering model for speech enhancement and recognition. In the proposed model, clean speech and noise signals are modeled as autoregressive process and the parameters are composed of linear prediction coefficients (LPCs) and driving noise variances. Recurrent neural networks are trained to estimate the line spectrum frequencies (LSFs) whereas an optimization problem is solved to attain noise variances such that to minimize the divergence between the modeled and predicted autoregressive spectrums of the noise contaminated speech. Augmented Kalman filtering with the estimated parameters are applied to the noisy speech for background noise reduction such that to improve the speech quality, intelligibility, and word error rates. Bidirectional GRUs network is implemented which predicts parameters both in the future and past contexts of the input sequence and outperform in terms of modeling the long-term dependencies. A compensated phase spectra is used to recover the enhanced speech signals. The Kaldi toolkit is employed to train the automatic speech recognition (ASR) system in order to measure the word error rates (WERs). By using the LibriSpeech dataset, the proposed model improved the quality, intelligibility, and word error rates by 35.52%, 18.79%, and 19.13%, respectively under various noisy environments.

求助该文献

最长约 10秒，即可获得该文献文件

DeepResGRU: Residual gated recurrent neural network-augmented Kalman filtering for speech enhancement and recognition

今日热心研友