短时傅里叶变换
计算机科学
遮罩(插图)
源分离
分离(统计)
唱歌
语音识别
傅里叶变换
听觉掩蔽
人工智能
声学
数学
傅里叶分析
机器学习
艺术
数学分析
物理
视觉艺术
作者
Yixuan Zhang,Yuzhou Liu,DeLiang Wang
标识
DOI:10.1109/icassp39728.2021.9414398
摘要
Music source separation is important for applications such as karaoke and remixing. Much of previous research focuses on estimating short-time Fourier transform (STFT) magnitude and discarding phase information. We observe that, for singing voice separation, phase can make considerable improvement in separation quality. This paper proposes a complex ratio masking method for voice and accompaniment separation. The proposed method employs DenseUNet with self attention to estimate the real and imaginary components of STFT for each sound source. A simple ensemble technique is introduced to further improve separation performance. Evaluation results demonstrate that the proposed method outperforms recent state-of-the-art models for both separated voice and accompaniment.
科研通智能强力驱动
Strongly Powered by AbleSci AI