计算机科学
发音
语音识别
普通话
音节
字错误率
任务(项目管理)
采样(信号处理)
人工智能
自然语言处理
声学模型
语音语料库
语音处理
语音合成
语言学
哲学
管理
滤波器(信号处理)
计算机视觉
经济
作者
I-Ting Hsieh,Chung‐Hsien Wu,Zhe-Hong Zhao
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:12: 106070-106083
标识
DOI:10.1109/access.2024.3437755
摘要
Under-resourced automatic speech recognition (ASR) has become an active field of research and has experienced significant progress during the past decade. However, the performance of under-resourced ASR trained by existing methods is still far inferior to high-resourced ASR for practical applications. In this paper, speech data from languages that share the most phonemes with the under-resourced language are selected as supplementary resources for meta-training based on the Model-Agnostic Meta-Learning (MAML) strategy. Besides supplementary language selection, this paper proposes a dynamic sampling method instead of the original random sampling method to select support and query sets for each task in MAML to improve meta-training performance. In this study, Taiwanese is selected as the under-resourced language, and the speech corpus of five languages, including Mandarin, English, Japanese, Cantonese, and Thai, are chosen as supplementary training data for acoustic model training. The proposed dynamic sampling approach uses phonemes, pronunciation, and speech recognition models as the basis to determine the proportion of each supplementary language to select helpful utterances for MAML. For evaluation, with the selected utterances from each supplementary language for meta-training, we obtained a Word Error Rate of 20.24% and a Syllable Error Rate of 8.35% for Taiwanese ASR, which were better than the baseline model (26.18% and 13.99%) using only the Taiwanese corpus and other methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI