Gradformer: A Framework for Multi-Aspect Multi-Granularity Pronunciation Assessment

粒度 计算机科学 编码器 发音 变压器 相关性 语音识别 人工智能 数学 电压 哲学 语言学 物理 几何学 量子力学 操作系统
作者
Hao-Chen Pei,Hao Fang,Xin Luo,Xin-Shun Xu
出处
期刊:IEEE/ACM transactions on audio, speech, and language processing [Institute of Electrical and Electronics Engineers]
卷期号:32: 554-563 被引量:2
标识
DOI:10.1109/taslp.2023.3335807
摘要

Automatic pronunciation assessment is an indispensable technology in computer-assisted pronunciation training systems. To further evaluate the quality of pronunciation, multi-task learning with simultaneous output of multi-granularity and multi-aspect has become a mainstream solution. Existing methods either predict scores at all granularity levels simultaneously through a parallel structure, or predict individual granularity scores layer by layer through a hierarchical structure. However, these methods do not fully understand and take advantage of the correlation between the three granularity levels of phoneme, word, and utterance. To address this issue, we propose a novel method, Granularity-decoupled Transformer (Gradformer), which is able to model the relationships between multiple granularity levels. Specifically, we first use a convolution-augmented transformer encoder to encode acoustic features, where the convolution module helps the model better capture local information. The model outputs both phoneme- and word-level granularity scores with high correlation by the encoder. Then, we use utterance queries to interact with the output of the encoder through the transformer decoder, ultimately obtaining the utterance scores. Through unique encoder and decoder architecture, we achieve decoupling at three granularity levels, and handling the relationship between each granularity. Experiments on the speachocean762 dataset show that our model has advantages over state-of-the-art methods in various metrics, especially in key metrics such as phoneme accuracy, word accuracy, and total score.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
星辰大海应助cpudkq采纳,获得10
1秒前
honey发布了新的文献求助10
2秒前
3秒前
3秒前
wanci应助neversay4ever采纳,获得10
4秒前
英俊的铭应助七田皿采纳,获得10
4秒前
None应助胡燕采纳,获得20
4秒前
哒哒猪完成签到,获得积分10
5秒前
斯文败类应助义气的妙松采纳,获得10
6秒前
搜集达人应助战场原荡漾采纳,获得10
6秒前
6秒前
英姑应助爱听歌的安彤采纳,获得10
6秒前
香蕉觅云应助潇123456采纳,获得10
8秒前
suchashing完成签到 ,获得积分10
9秒前
mudiboyang发布了新的文献求助10
11秒前
脑洞疼应助roclie采纳,获得10
11秒前
Hello应助honey采纳,获得10
12秒前
14秒前
15秒前
高高的汝燕完成签到,获得积分10
15秒前
15秒前
自由冷玉完成签到,获得积分10
15秒前
16秒前
Akim应助科研小破白菜采纳,获得10
16秒前
16秒前
欢呼的凌兰完成签到,获得积分10
17秒前
17秒前
Akim应助无限聋五采纳,获得10
17秒前
充电宝应助孤独的匕采纳,获得10
19秒前
酷波er应助mzhmhy采纳,获得10
19秒前
舒芙蕾发布了新的文献求助10
19秒前
Qintt完成签到 ,获得积分10
19秒前
泥鳅应助20240901采纳,获得10
19秒前
Bruce Lin完成签到,获得积分10
20秒前
贝壳完成签到,获得积分10
21秒前
21秒前
wsx4321应助wiedii采纳,获得10
22秒前
星河qaq发布了新的文献求助10
23秒前
23秒前
哭泣的盼易完成签到,获得积分10
23秒前
高分求助中
Production Logging: Theoretical and Interpretive Elements 2700
Les Mantodea de Guyane Insecta, Polyneoptera 1000
Conference Record, IAS Annual Meeting 1977 820
England and the Discovery of America, 1481-1620 600
Teaching language in context (Third edition) by Derewianka, Beverly; Jones, Pauline 550
Typology of Conditional Constructions 500
CLSI M100 Performance Standards for Antimicrobial Susceptibility Testing, 35th Edition 400
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3587353
求助须知:如何正确求助?哪些是违规求助? 3156087
关于积分的说明 9508786
捐赠科研通 2858763
什么是DOI,文献DOI怎么找? 1571069
邀请新用户注册赠送积分活动 736728
科研通“疑难数据库(出版商)”最低求助积分说明 721902