计算机科学
说话人识别
语音识别
人工智能
人工神经网络
集合(抽象数据类型)
训练集
字错误率
模式识别(心理学)
试验装置
说话人日记
程序设计语言
作者
Ruijie Tao,Kong Aik Lee,Rohan Kumar Das,Ville Hautamäki,Haizhou Li
标识
DOI:10.1109/icassp43922.2022.9747162
摘要
In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals. It is a known fact that a speaker recognition model doesn’t always benefit from pseudo labels due to their unreliability. In this work, we observe that a speaker recognition network tends to model the data with reliable labels faster than those with unreliable labels. This motivates us to study a loss-gated learning (LGL) strategy, which extracts the reliable labels through the fitting ability of the neural network during training. With the proposed LGL, our speaker recognition model obtains a 46.3% performance gain over the system without it. Further, the proposed self-supervised speaker recognition with LGL trained on the VoxCeleb2 dataset without any labels achieves an equal error rate of 1.66% on the VoxCeleb1 original test set.
科研通智能强力驱动
Strongly Powered by AbleSci AI