比索
计算机科学
语音增强
语音识别
可理解性(哲学)
离散余弦变换
计算
卷积(计算机科学)
深度学习
人工智能
算法
降噪
模式识别(心理学)
认识论
人工神经网络
图像(数学)
哲学
作者
Chaitanya Jannu,Sunny Dayal Vanambathina
出处
期刊:Journal of Intelligent and Fuzzy Systems
[IOS Press]
日期:2023-05-09
卷期号:45 (1): 1195-1208
被引量:2
摘要
Over the past ten years, deep learning has enabled significant advancements in the improvement of noisy speech. Due to the short time stability of speech signal, previous speech enhancement (SE) methods concentrated only on magnitude estimation, and these methods added a phase of the mixture in reconstructing the speech. The performance is limited in these approaches since the phase will also carry some of the speech information. Some of the speech enhancement approaches were developed later to jointly estimate both magnitudes as well as phases. Recently, complex-valued models, like deep complex convolution recurrent network (DCCRN), are proposed, but the computation of the model is very huge. In this work, we propose a Discrete Cosine Transform-based Densely Connected Convolutional Gated Recurrent Unit (DCTDCCGRU) model using dilated dense block and stacked GRU. The dense connectivity strengthens the gradient propagation by concatenating features from previous layers at the input. The advantage of the dense block is that at various resolutions, the dilated convolutions aid with context aggregation, and the dense connectivity provides a feature map with more precise target information by passing through multiple layers. To represent the correlation between neighboring noisy speech frames, a two Layer GRU is added in the bottleneck of U-Net. The experimental findings demonstrate that the proposed model outperformed the other existing models in terms of STOI (short-time objective intelligibility), PESQ (perceptual evaluation of the speech quality), and output SNR (signal-to-noise ratio).
科研通智能强力驱动
Strongly Powered by AbleSci AI