Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses

计算机科学 语音识别 编码器 一致性(知识库) 正规化(语言学) 字错误率 语音错误 自然语言处理 人工智能 演讲制作 操作系统
作者
Zhehuai Chen,Yu Zhang,Andrew Rosenberg,Bhuvana Ramabhadran,Pedro J. Moreno,Gary Wang
标识
DOI:10.1109/icassp43922.2022.9746475
摘要

An effective way to learn representations from untranscribed speech and unspoken text with linguistic/lexical representations derived from synthesized speech was introduced in tts4pretrain [1]. However, the representations learned from synthesized and real speech are likely to be different, potentially limiting the improvements from incorporating unspoken text. In this paper, we introduce learning from supervised speech earlier on in the training process with consistency-based regularization between real and synthesized speech. This allows for better learning of shared speech and text representations. Thus, we introduce a new objective, with encoder and decoder consistency and contrastive regularization between real and synthesized speech derived from the labeled corpora during the pretraining stage. We show that the new objective leads to more similar representations derived from speech and text that help downstream ASR. The proposed pretraining method yields Word Error Rate (WER) reductions of 7-21% relative on six public corpora, Librispeech, AMI, TEDLIUM, Common Voice, Switchboard, CHiME-6, over a state-of-the-art baseline pretrained with wav2vec2.0 and 2-17% over the previously proposed tts4pretrain. The proposed method outperforms the supervised SpeechStew by up to 17%. Moreover, we show that the proposed method also yields WER reductions on larger data sets by evaluating on a large resource, in-house Voice Search task and streaming ASR.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
BZPL发布了新的文献求助10
1秒前
2秒前
青桔完成签到,获得积分10
3秒前
卡布达完成签到,获得积分10
3秒前
4秒前
mar1ne完成签到 ,获得积分10
4秒前
5秒前
6秒前
领导范儿应助科研通管家采纳,获得10
8秒前
传奇3应助科研通管家采纳,获得30
8秒前
科研通AI5应助科研通管家采纳,获得10
8秒前
英姑应助科研通管家采纳,获得10
8秒前
大模型应助科研通管家采纳,获得10
8秒前
李健应助科研通管家采纳,获得10
8秒前
传奇3应助科研通管家采纳,获得10
8秒前
科研通AI5应助科研通管家采纳,获得10
8秒前
科研通AI2S应助科研通管家采纳,获得10
8秒前
Orange应助科研通管家采纳,获得10
8秒前
领导范儿应助科研通管家采纳,获得10
8秒前
大模型应助科研通管家采纳,获得10
8秒前
科研通AI5应助科研通管家采纳,获得10
8秒前
华仔应助科研通管家采纳,获得10
8秒前
科研通AI5应助科研通管家采纳,获得10
8秒前
科研通AI5应助科研通管家采纳,获得10
8秒前
NexusExplorer应助科研通管家采纳,获得10
8秒前
汉堡包应助科研通管家采纳,获得10
8秒前
慕青应助科研通管家采纳,获得10
8秒前
科研通AI5应助科研通管家采纳,获得10
8秒前
wxj发布了新的文献求助10
9秒前
王一完成签到,获得积分10
9秒前
刘玲发布了新的文献求助10
10秒前
captainx发布了新的文献求助10
11秒前
汉堡包应助十三采纳,获得10
11秒前
12秒前
12秒前
13秒前
霍小美完成签到,获得积分10
13秒前
lxlcx关注了科研通微信公众号
14秒前
14秒前
15秒前
高分求助中
Continuum thermodynamics and material modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Healthcare Finance: Modern Financial Analysis for Accelerating Biomedical Innovation 2000
Applications of Emerging Nanomaterials and Nanotechnology 1111
Unseen Mendieta: The Unpublished Works of Ana Mendieta 1000
Les Mantodea de Guyane Insecta, Polyneoptera 1000
工业结晶技术 880
热门求助领域 (近24小时)
化学 医学 材料科学 生物 工程类 有机化学 生物化学 纳米技术 内科学 物理 化学工程 计算机科学 复合材料 基因 遗传学 物理化学 催化作用 细胞生物学 免疫学 电极
热门帖子
关注 科研通微信公众号,转发送积分 3489857
求助须知:如何正确求助?哪些是违规求助? 3076978
关于积分的说明 9147123
捐赠科研通 2769152
什么是DOI,文献DOI怎么找? 1519630
邀请新用户注册赠送积分活动 704069
科研通“疑难数据库(出版商)”最低求助积分说明 702084