光谱图
计算机科学
语音识别
失真(音乐)
人工智能
模式识别(心理学)
匹配(统计)
特征提取
公制(单位)
人工神经网络
边距(机器学习)
任务(项目管理)
机器学习
数学
统计
计算机网络
放大器
运营管理
管理
带宽(计算)
经济
作者
Ying Hu,Haitao Xu,Zhongcun Guo,Hao Huang,Liang He
标识
DOI:10.1109/icassp48485.2024.10447832
摘要
We propose a deep neural network with spectrogram matching and mutual attention (SMMA-Net) for audio clue-based target speaker extraction (TSE). To effectively use the auxiliary speech, we proposed spectrogram matching (SM) strategy and mutual attention (MA) block. We conducted all experiments on the WSJ0-2mix-extr dataset. The ablation and comparison studies verified the effectiveness of SM strategy and MA block. The experimental results show that our proposed method outperforms the state-of-the-art methods by a sizable margin of 1.3 dB on the metric of scale-invariant signal-to-distortion ratio improvement. Additionally, SMMA-Net achieved that the performance of model for TSE task exceeds that for speaker separation task under the similar architecture. The main code will be available at https://github.com/Ht-Xu/SMMA-Net.
科研通智能强力驱动
Strongly Powered by AbleSci AI