An AUC-based permutation variable importance measure for random forests

排列(音乐) 随机排列 随机森林 排名(信息检索) 重采样 接收机工作特性 算法 计算机科学 统计 班级(哲学) 数学 数据挖掘 人工智能 机器学习 组合数学 物理 块(置换群论) 声学
作者
Silke Janitza,Carolin Strobl,Anne‐Laure Boulesteix
出处
期刊:BMC Bioinformatics [Springer Nature]
卷期号:14 (1) 被引量:177
标识
DOI:10.1186/1471-2105-14-119
摘要

The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
刚刚
路内里发布了新的文献求助10
1秒前
qqq发布了新的文献求助10
1秒前
霜橙完成签到,获得积分10
2秒前
wwzp完成签到,获得积分10
2秒前
zh完成签到,获得积分10
2秒前
研友_V8Qmr8发布了新的文献求助10
3秒前
科研小崩豆完成签到,获得积分10
4秒前
SciGPT应助投石问路采纳,获得10
4秒前
ksoeeis发布了新的文献求助10
4秒前
在水一方应助星星气球采纳,获得50
5秒前
铱星完成签到,获得积分10
5秒前
ABS发布了新的文献求助30
5秒前
6秒前
朔寒发布了新的文献求助10
6秒前
路内里完成签到,获得积分10
7秒前
独特的自中完成签到,获得积分20
8秒前
务实的焦完成签到 ,获得积分10
8秒前
li发布了新的文献求助10
9秒前
HEXIN发布了新的文献求助20
9秒前
搜集达人应助赵赵采纳,获得10
9秒前
10秒前
呀呀呀呀发布了新的文献求助30
11秒前
wwzp发布了新的文献求助30
12秒前
12秒前
上官若男应助独特的自中采纳,获得10
13秒前
Larry完成签到,获得积分20
13秒前
璀璨发布了新的文献求助10
13秒前
冷静的如音完成签到,获得积分10
13秒前
14秒前
14秒前
15秒前
可爱的函函应助江峰采纳,获得10
16秒前
可爱的函函应助普萘洛尔采纳,获得10
16秒前
17秒前
18秒前
dej发布了新的文献求助30
18秒前
19秒前
19秒前
高分求助中
Evolution 10000
Sustainability in Tides Chemistry 2800
юрские динозавры восточного забайкалья 800
Diagnostic immunohistochemistry : theranostic and genomic applications 6th Edition 500
Chen Hansheng: China’s Last Romantic Revolutionary 500
China's Relations With Japan 1945-83: The Role of Liao Chengzhi 400
Classics in Total Synthesis IV 400
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3149952
求助须知:如何正确求助?哪些是违规求助? 2800974
关于积分的说明 7842886
捐赠科研通 2458475
什么是DOI,文献DOI怎么找? 1308544
科研通“疑难数据库(出版商)”最低求助积分说明 628524
版权声明 601721