发布文献求助

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

杠杆（统计）计算机科学推论语言理解蒸馏语言模型计算任务（项目管理）代表（政治）人工智能边缘设备机器学习 GSM演进的增强数据速率自然语言处理算法工程类云计算有机化学系统工程政治法学政治学操作系统化学

作者

Victor Sanh,Lysandre Debut,Julien Chaumond,Alexander M. Rush

出处

期刊：Cornell University - arXiv 日期：2019-01-01 被引量：3817

链接

arxiv.org datacite.orgdoi.org

标识

DOI：10.48550/arxiv.1910.01108

摘要

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive biases learned by larger models during pre-training, we introduce a triple loss combining language modeling, distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device study.

求助该文献

最长约 10秒，即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

更新

2025年影响因子查询已上线 (2025-6-18)

更新

PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: 做实验太菜完成签到，获得积分10

刚刚; SciGPT的应助被BaBa采纳，获得10

1秒前; 华仔上传了应助文件

2秒前; Gtingting关注了科研通微信公众号

3秒前; llll完成签到，获得积分10

3秒前; 涨涨涨发布了新的文献求助10

4秒前; Galaxy完成签到，获得积分10

4秒前; 英吉利25发布了新的文献求助10

6秒前; 上官若男上传了应助文件

6秒前; SciGPT上传了应助文件

8秒前; 科研通AI2S的应助被凶狠的便当采纳，获得10

9秒前; 华仔的应助被高工采纳，获得10

10秒前; 笑点低的发箍发布了新的文献求助10

11秒前; 深情安青的应助被诸缘郡采纳，获得10

11秒前; 田様上传了应助文件

11秒前; wyblobin完成签到，获得积分10

12秒前; 努力学习完成签到，获得积分10

12秒前; BaBa发布了新的文献求助10

13秒前; 斯文败类的应助被早起先喝一碗粥采纳，获得10

13秒前; 李健的小迷弟的应助被浅色墨水采纳，获得10

14秒前; FashionBoy上传了应助文件

14秒前; Wuc发布了新的文献求助10

14秒前; 还没想好完成签到，获得积分10

16秒前; DaLu完成签到，获得积分10

18秒前; 炙热的念柏的应助被笑点低的发箍采纳，获得10

18秒前; 大个上传了应助文件

19秒前; 火星上的澜发布了新的文献求助30

19秒前; lenon完成签到，获得积分10

19秒前; Gtingting发布了新的文献求助10

20秒前; 研友_Zl1w68完成签到，获得积分20

20秒前; sumuuchen完成签到，获得积分20

20秒前; kathleen完成签到，获得积分10

20秒前; 标致绮露完成签到，获得积分10

21秒前; luochen完成签到，获得积分10

21秒前; 孙晓婷完成签到，获得积分10

21秒前; Orange上传了应助文件

22秒前; hana完成签到，获得积分10

23秒前; 深情安青上传了应助文件

24秒前; 标致绮露发布了新的文献求助10

24秒前; 橙子abcy完成签到，获得积分10

24秒前

高分求助中: A new approach to the extrapolation of accelerated life test data 1000; Cognitive Neuroscience: The Biology of the Mind 1000; Technical Brochure TB 814: LPIT applications in HV gas insulated switchgear 1000; Immigrant Incorporation in East Asian Democracies 600; Nucleophilic substitution in azasydnone-modified dinitroanisoles 500; 不知道标题是什么 500; A Preliminary Study on Correlation Between Independent Components of Facial Thermal Images and Subjective Assessment of Chronic Stress 500

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 3966458; 求助须知：如何正确求助？哪些是违规求助？ 3511940; 关于积分的说明 11161056; 捐赠科研通 3246726; 什么是DOI，文献DOI怎么找？ 1793483; 邀请新用户注册赠送积分活动 874465; 科研通“疑难数据库（出版商）”最低求助积分说明 804403

今日热心研友

完美的友蕊

要好好看文献

眼睛大雨筠

酷酷的冰真

热心市民小红花

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2025 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：941272744【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通