计算机科学
语音增强
计算复杂性理论
语音识别
可理解性(哲学)
卷积神经网络
推论
延迟(音频)
人工智能
算法
降噪
电信
哲学
认识论
作者
A. A. Rajabi,Mohammed Krini
标识
DOI:10.1109/ispa58351.2023.10278916
摘要
Real-time communication through cell phones and telephones often involves challenging acoustic environments where the original speech signal is contaminated by environmental noise, known as the cocktail party problem. Audio source separation can be an effective solution for isolating the voice in a noisy environment, by suppressing undesired noise without distorting speech components, which can improve speech quality and intelligibility. Deep Neural Network (DNN) models, despite their excellent performance in speech enhancement, require a substantial computational effort during the inference process. This characteristic makes them less than ideal for addressing this specific problem. The high computational complexity of deep models can further impede regression latency, which is crucial for real-time applications that require minimized complexity. Considering these assumptions in this paper, a novel neural network for speech enhancement is presented, which incorporates phase information into the loss function. The proposed method utilizes a convolutional Recurrent Dense (CRD) network, which not only achieves notable computational efficiency but also demonstrates superior performance compared to other existing networks. Experimental results are provided to highlight the advantages and distinctions of the CRD network when compared with alternative state of the art approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI