随机森林
计算机科学
度量(数据仓库)
机器学习
简单(哲学)
朴素贝叶斯分类器
人工智能
特征(语言学)
决策树
结果(博弈论)
编码(集合论)
支持向量机
变量(数学)
大数据
数据挖掘
数学
集合(抽象数据类型)
数理经济学
程序设计语言
哲学
数学分析
认识论
语言学
作者
Brandon Greenwell,Bradley C. Boehmke,Andrew J. McCarthy
出处
期刊:Cornell University - arXiv
日期:2018-01-01
被引量:98
标识
DOI:10.48550/arxiv.1805.04755
摘要
In the era of "big data", it is becoming more of a challenge to not only build state-of-the-art predictive models, but also gain an understanding of what's really going on in the data. For example, it is often of interest to know which, if any, of the predictors in a fitted model are relatively influential on the predicted outcome. Some modern algorithms---like random forests and gradient boosted decision trees---have a natural way of quantifying the importance or relative influence of each feature. Other algorithms---like naive Bayes classifiers and support vector machines---are not capable of doing so and model-free approaches are generally used to measure each predictor's importance. In this paper, we propose a standardized, model-based approach to measuring predictor importance across the growing spectrum of supervised learning algorithms. Our proposed method is illustrated through both simulated and real data examples. The R code to reproduce all of the figures in this paper is available in the supplementary materials.
科研通智能强力驱动
Strongly Powered by AbleSci AI