光谱图
计算机科学
编码器
波形
时域
特征(语言学)
模式识别(心理学)
源分离
特征提取
领域(数学分析)
源代码
人工智能
卷积神经网络
频域
语音识别
算法
数学
计算机视觉
操作系统
电信
数学分析
哲学
语言学
雷达
作者
Ying Hu,Yadong Chen,Danny Chen,Liang He,Hao Huang
出处
期刊:IEEE Signal Processing Letters
[Institute of Electrical and Electronics Engineers]
日期:2022-01-01
卷期号:29: 1517-1521
被引量:15
标识
DOI:10.1109/lsp.2022.3187316
摘要
Recently, the time-domain-based methods (i.e., the method of modeling the raw waveform directly) for audio source separation have shown tremendous potential. In this paper, we propose a model which combines the complexed spectrogram domain feature and time-domain feature by a cross-domain encoder (CDE) and adopts the hierarchic temporal convolutional network (HTCN) for multiple music sources separation. The CDE is designed to enable the network to code the interactive information of the time-domain and complexed spectrogram domain features. HTCN enables it to learn the long-time series dependence effectively. We also designed a feature calibration unit (FCU) to be applied in the HTCN and adopted the multi-stage training strategy during the training stage. The ablation study demonstrates the effectiveness of each designed component in the model. We conducted the experiments on the MUSDB18 dataset. The experimental results indicate that our proposed CDE-HTCN model outperforms the top-of-the-line methods and, compared with the state-of-the-art method, DEMUCS, achieves the improvement of the average SDR score of 0.61 dB. Significantly, the improvement of the SDR score for the $\ bass$ source has a sizable margin of 0.91 dB.
科研通智能强力驱动
Strongly Powered by AbleSci AI