发音
计算机科学
判决
人工智能
任务(项目管理)
语音识别
自然语言处理
电话
适配器(计算)
语言学
哲学
管理
经济
操作系统
作者
Jing Li,Rui Li,Shen Guo,Aishan Wumaier
标识
DOI:10.1109/apsipaasc58517.2023.10317374
摘要
Automatic pronunciation assessment is an important part of computer-aided pronunciation training. Due to the scarcity of non-native pronunciation assessment datasets and the fact that traditional speech assessment usually uses Goodness of Pronunciation(GOP) features, this may not provide enough information for word or sentence-level assessment. This paper aims to improve the performance of automatic pronunciation assessment from two aspects. First, to alleviate the problem of insufficient training data for pronunciation assessment, we use the weakly supervised learning model Whisper to build a pronunciation assessment model. With the Whisper encoder, Pearson correlation coefficient(PCC) performance is significantly improved compared to traditional acoustic features. Second, we propose a multi-adapters method that uses a multi-task loss to fine-tune the adapter while simultaneously learning phone, word, and sentence assessment tasks to boost sentence-level assessment task performance. In addition, through the experimental comparison of different scale models in the Whisper, the experimental results on the open-source dataset speechocean762 show that our proposed method achieves the best results in the medium.en model.
科研通智能强力驱动
Strongly Powered by AbleSci AI