Yi Jiang,Qun Yang,yiru zhang,zhichao wu,Sixing Liu,Shaohan Liu
出处
期刊:Social Science Research Network [Social Science Electronic Publishing] 日期:2024-01-01被引量:1
标识
DOI:10.2139/ssrn.4492197
摘要
Deep learning-based speech enhancement models, such as the complex U-Net model, achieve good results. However, as these methods simply use convolutional neural networks, they can't effectively handle the special features of the speech spectrum, including long-term temporal dependencies, crossfrequency correlations, and spatial position information. In this paper, we propose a new speech enhancement model called CCAUNet. In the model, we design a novel complex coordinate attention structure to simultaneously pay attention to temporal dependencies, frequency dependencies, and spatial position information. Meanwhile, we employ a multi resolution STFT loss, which is combined with SI-SNR loss to aid complex coordinate attention accurately process spectral features. Experimental results conducted on the Deep Noise Suppression Challenge dataset show that the proposed CCAUNet outperforms all compared models on WB-PESQ, NB-PESQ, STOI and SISNR metrics.