Enhancing Low-Resource NLP by Consistency Training With Data and Model Permutations

一致性(知识库) 计算机科学 人工智能 过度拟合 自然语言处理 水准点(测量) 机器学习 机器翻译 资源(消歧) 一般化 深度学习 人工神经网络 数学 计算机网络 数学分析 大地测量学 地理
作者
Xiaobo Liang,Robert Mao,Lijun Wu,Juntao Li,Min Zhang,Qing Li
出处
期刊:IEEE/ACM transactions on audio, speech, and language processing [Institute of Electrical and Electronics Engineers]
卷期号:32: 189-199
标识
DOI:10.1109/taslp.2023.3325970
摘要

Natural language processing (NLP) has recently shown significant progress in rich-resource scenarios. However, it is much less effective for low-resource scenarios due to the model easily overfitting to limited training data and generalizing poorly on testing data. In recent years, consistency training has been widely adopted and shown great promise in deep learning, but still remains unexplored in low-resource settings. In this work, we propose DM-CT, a framework that incorporates both data-level and model-level consistency training as well as advanced data augmentation techniques for low-resource scenarios. Concretely, the input data is first augmented, and the output distributions of different sub-models generated by model variance are forced to be consistent (model-level consistency). Meanwhile, the predictions of the original input and the augmented one are also constrained to be consistent (data-level consistency). Experiments on different low-resource NLP tasks, including neural machine translation (4 IWSLT14 translation tasks, multilingual translation task, and WMT16 Romanian $\rightarrow$ English translation), natural language understanding tasks (GLUE benchmark), and named entity recognition (Conll2003 and WikiGold), well demonstrate the superiority of DM-CT by obtaining significant and consistent performance improvements.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
大Doctor陈完成签到,获得积分10
3秒前
3秒前
4秒前
大Doctor陈发布了新的文献求助80
6秒前
wangxx完成签到,获得积分20
10秒前
11发布了新的文献求助10
14秒前
14秒前
行走在科研的小路上完成签到,获得积分10
17秒前
lily完成签到,获得积分20
18秒前
林夕完成签到 ,获得积分10
19秒前
19秒前
情怀应助凶凶采纳,获得10
21秒前
充电宝应助psycan采纳,获得10
22秒前
Tal完成签到,获得积分10
23秒前
一口一个小面包完成签到,获得积分10
24秒前
Ch完成签到 ,获得积分10
27秒前
28秒前
29秒前
Hello应助刘超D采纳,获得10
29秒前
wanci应助逃亡的小狗采纳,获得10
30秒前
YPXCS完成签到 ,获得积分10
31秒前
31秒前
亻圭完成签到,获得积分10
32秒前
32秒前
可爱的函函应助archer01采纳,获得10
32秒前
33秒前
飞天三叉戟给Roman的求助进行了留言
34秒前
34秒前
Jasper应助否认冶游史采纳,获得10
35秒前
35秒前
36秒前
psycan发布了新的文献求助10
37秒前
37秒前
MeiLing完成签到,获得积分10
39秒前
39秒前
书篆完成签到,获得积分10
39秒前
39秒前
IlIIlIlIIIllI应助一个小柠檬采纳,获得30
40秒前
41秒前
高分求助中
Production Logging: Theoretical and Interpretive Elements 2500
Востребованный временем 2500
Agaricales of New Zealand 1: Pluteaceae - Entolomataceae 1040
지식생태학: 생태학, 죽은 지식을 깨우다 600
Caveolins and Caveolae 500
海南省蛇咬伤流行病学特征与预后影响因素分析 500
Neuromuscular and Electrodiagnostic Medicine Board Review 500
热门求助领域 (近24小时)
化学 医学 材料科学 生物 工程类 有机化学 生物化学 纳米技术 内科学 物理 化学工程 计算机科学 复合材料 基因 遗传学 物理化学 催化作用 细胞生物学 免疫学 电极
热门帖子
关注 科研通微信公众号,转发送积分 3461184
求助须知:如何正确求助?哪些是违规求助? 3054912
关于积分的说明 9045435
捐赠科研通 2744812
什么是DOI,文献DOI怎么找? 1505685
科研通“疑难数据库(出版商)”最低求助积分说明 695786
邀请新用户注册赠送积分活动 695205