发布文献求助

清晨好，您是今天最早来到科研通的研友！由于当前在线用户较少，发布求助请尽量完整的填写文献信息，科研通机器人24小时在线，伴您科研之路漫漫前行！

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

培训（气象学）计算机科学数学教育心理学物理气象学

作者

Iulia Turc,Ming‐Wei Chang,Kenton Lee,Kristina Toutanova

出处

期刊：Cornell University - arXiv 日期：2019-01-01 被引量：428

链接

arxiv.org datacite.orgdoi.org

标识

DOI：10.48550/arxiv.1908.08962

摘要

Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training. Due to the cost of applying such models to down-stream tasks, several model compression techniques on pre-trained language representations have been proposed (Sun et al., 2019; Sanh, 2019). However, surprisingly, the simple baseline of just pre-training and fine-tuning compact models has been overlooked. In this paper, we first show that pre-training remains important in the context of smaller architectures, and fine-tuning pre-trained compact models can be competitive to more elaborate methods proposed in concurrent work. Starting with pre-trained compact models, we then explore transferring task knowledge from large fine-tuned models through standard knowledge distillation. The resulting simple, yet effective and general algorithm, Pre-trained Distillation, brings further improvements. Through extensive experiments, we more generally explore the interaction between pre-training and distillation under two variables that have been under-studied: model size and properties of unlabeled task data. One surprising observation is that they have a compound effect even when sequentially applied on the same data. To accelerate future research, we will make our 24 pre-trained miniature BERT models publicly available.

求助该文献

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

更新

2024年影响因子查询已上线 (2024-6-20)

更新

大幅提高文件上传限制，最高150M (2024-4-1)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: orixero上传了应助文件

7秒前; 咸金城发布了新的文献求助10

13秒前; 爆米花的应助被隐形问萍采纳，获得10

28秒前; 打打的应助被隐形问萍采纳，获得10

28秒前; Hello的应助被隐形问萍采纳，获得10

28秒前; Orange的应助被隐形问萍采纳，获得10

28秒前; 慕青的应助被隐形问萍采纳，获得20

28秒前; wanci的应助被隐形问萍采纳，获得10

28秒前; 情怀的应助被隐形问萍采纳，获得10

28秒前; 顾矜的应助被隐形问萍采纳，获得10

28秒前; 英姑的应助被隐形问萍采纳，获得10

28秒前; 充电宝的应助被隐形问萍采纳，获得10

28秒前; 科研通AI2S的应助被咸金城采纳，获得30

29秒前; chcmy完成签到，获得积分0

54秒前; zcbb完成签到，获得积分10

1分钟前; chenying完成签到，获得积分0

1分钟前; Tong完成签到，获得积分0

1分钟前; lyj完成签到，获得积分10

1分钟前; creep2020完成签到，获得积分10

2分钟前; 科研通AI2S的应助被科研通管家采纳，获得10

2分钟前; 传奇3上传了应助文件

2分钟前; cnmkyt完成签到，获得积分10

2分钟前; 脑洞疼的应助被cnmkyt采纳，获得10

2分钟前; 科目三上传了应助文件

2分钟前; 脑洞疼上传了应助文件

2分钟前; cnmkyt发布了新的文献求助10

2分钟前; zhangguo完成签到，获得积分10

2分钟前; 隐形曼青上传了应助文件

3分钟前; meini完成签到，获得积分10

3分钟前; 通科研完成签到，获得积分10

4分钟前; 实力不允许完成签到，获得积分10

4分钟前; 英姑上传了应助文件

4分钟前; 充电宝上传了应助文件

4分钟前; 顾矜上传了应助文件

4分钟前; 打打上传了应助文件

4分钟前; Akim上传了应助文件

4分钟前; 隐形问萍发布了新的文献求助10

4分钟前; 隐形问萍发布了新的文献求助10

4分钟前; 隐形问萍发布了新的文献求助10

4分钟前; Orange上传了应助文件

4分钟前

高分求助中: Earth System Geophysics 1000; Co-opetition under Endogenous Bargaining Power 666; Medicina di laboratorio. Logica e patologia clinica 600; Handbook of Marine Craft Hydrodynamics and Motion Control, 2nd Edition 500; Sarcolestes leedsi Lydekker, an ankylosaurian dinosaur from the Middle Jurassic of England 500; 《关于整治突出dupin问题的实施意见》（厅字〔2019〕52号） 500; Language injustice and social equity in EMI policies in China 500

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 3213133; 求助须知：如何正确求助？哪些是违规求助？ 2861948; 关于积分的说明 8131246; 捐赠科研通 2527901; 什么是DOI，文献DOI怎么找？ 1361934; 科研通“疑难数据库（出版商）”最低求助积分说明 643561; 邀请新用户注册赠送积分活动 615885

今日热心研友

丰盛的煎饼

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2025 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：941272744【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通