LightGBM: A Highly Efficient Gradient Boosting Decision Tree

Boosting(机器学习) 计算机科学 决策树 梯度升压 交替决策树 人工智能 增量决策树 机器学习 决策树学习 随机森林
作者
Guolin Ke,Qi Meng,Thomas Finley,Taifeng Wang,Wei Chen,Weidong Ma,Qiwei Ye,Tie‐Yan Liu
出处
期刊:Neural Information Processing Systems 被引量:5295
摘要

Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm, and has quite a few effective implementations such as XGBoost and pGBRT. Although many engineering optimizations have been adopted in these implementations, the efficiency and scalability are still unsatisfactory when the feature dimension is high and data size is large. A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming. To tackle this problem, we propose two novel techniques: Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). With GOSS, we exclude a significant proportion of data instances with small gradients, and only use the rest to estimate the information gain. We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size. With EFB, we bundle mutually exclusive features (i.e., they rarely take nonzero values simultaneously), to reduce the number of features. We prove that finding the optimal bundling of exclusive features is NP-hard, but a greedy algorithm can achieve quite good approximation ratio (and thus can effectively reduce the number of features without hurting the accuracy of split point determination by much). We call our new GBDT implementation with GOSS and EFB LightGBM. Our experiments on multiple public datasets show that, LightGBM speeds up the training process of conventional GBDT by up to over 20 times while achieving almost the same accuracy.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
斯文败类应助dannnnn采纳,获得10
1秒前
积极的小馒头应助弄香采纳,获得10
2秒前
酷酷季节完成签到,获得积分10
2秒前
2秒前
不配.应助追寻的不正采纳,获得20
2秒前
wyl发布了新的文献求助10
5秒前
DaSheng完成签到,获得积分10
5秒前
等风吹完成签到,获得积分10
6秒前
6秒前
李狗蛋完成签到 ,获得积分10
7秒前
小马甲应助MrHwc采纳,获得10
7秒前
hhhm发布了新的文献求助10
8秒前
Neo完成签到,获得积分10
13秒前
科目三应助xpd采纳,获得10
13秒前
14秒前
脑洞疼应助wyl采纳,获得10
15秒前
16秒前
16秒前
18秒前
等于几都行完成签到 ,获得积分10
19秒前
不配.应助hhhm采纳,获得10
22秒前
22秒前
哈哈哈发布了新的文献求助10
23秒前
外向语堂完成签到 ,获得积分10
24秒前
24秒前
英俊的铭应助小李采纳,获得10
24秒前
瓜瓜完成签到,获得积分20
26秒前
26秒前
博学而多问完成签到 ,获得积分10
27秒前
28秒前
30秒前
Orange应助一片树叶的夏天采纳,获得10
33秒前
xpd发布了新的文献求助10
34秒前
流萤发布了新的文献求助10
35秒前
科研通AI2S应助张朝程采纳,获得10
36秒前
于是真的完成签到,获得积分10
37秒前
37秒前
orixero应助LHOII采纳,获得10
42秒前
暴躁的眼神完成签到,获得积分10
44秒前
44秒前
高分求助中
The Oxford Handbook of Social Cognition (Second Edition, 2024) 1050
The Young builders of New china : the visit of the delegation of the WFDY to the Chinese People's Republic 1000
юрские динозавры восточного забайкалья 800
English Wealden Fossils 700
Chen Hansheng: China’s Last Romantic Revolutionary 500
Mantiden: Faszinierende Lauerjäger Faszinierende Lauerjäger 500
PraxisRatgeber: Mantiden: Faszinierende Lauerjäger 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3141258
求助须知:如何正确求助?哪些是违规求助? 2792257
关于积分的说明 7801943
捐赠科研通 2448459
什么是DOI,文献DOI怎么找? 1302536
科研通“疑难数据库(出版商)”最低求助积分说明 626638
版权声明 601237