Softmax函数
计算机科学
人工智能
分类器(UML)
推论
语音识别
特征学习
字错误率
序列学习
模式识别(心理学)
深度学习
作者
Keqi Deng,Gaofeng Cheng,Runyan Yang,Yonghong Yan
出处
期刊:IEEE/ACM transactions on audio, speech, and language processing
[Institute of Electrical and Electronics Engineers]
日期:2022-01-01
卷期号:30: 340-354
被引量:6
标识
DOI:10.1109/taslp.2021.3138707
摘要
Recently, we have witnessed excellent improvement of end-to-end (E2E) automatic speech recognition (ASR). However, how to tackle the long-tailed data distribution problem while maintaining E2E ASR models' performance for high-frequency tokens is still challenging. To solve this challenge, we propose a novel decoupled ASR learning method for the sequence-to-sequence ASR architecture in this paper. Our method decouples the learning procedure of this model into two stages: representation learning and classification learning. In the representation learning stage, we use the encoder output of a pretrained language model as one of the ASR model’s learning targets, and propose threshold log cosine embedding loss (TLCE-loss) as the objective function. A frequency-mask cross-entropy loss (FMCE-loss) is also designed as an auxiliary loss. In the classification learning stage, we find that introducing a temperature into softmax function helps reduce the influence of negative samples on tail classes, thus mitigating the biased learning process for the classifier. Furthermore, we propose a weighted softmax (w-softmax) to adjust ASR posterior probabilities according to the token appearing frequency during inference. Additionally, we introduce tail word/character error rate (TWER / TCER) and head word/character error rate (HWER / HCER) that respectively evaluate the ASR accuracy for tail and head words/characters. Experimental results on the Switchboard and HKUST corpora show that our proposed method greatly outperforms the baseline, especially in TWER / TCER reduction. To the best of our knowledge, this is the first work to use a decoupled ASR learning method to alleviate the long-tailed problem in sequence-to-sequence ASR.
科研通智能强力驱动
Strongly Powered by AbleSci AI