相关性(法律)
计算机科学
特征选择
人工智能
机器学习
选择(遗传算法)
分拆(数论)
ID3
特征(语言学)
数据挖掘
决策树
数学
决策树学习
组合数学
政治学
法学
语言学
哲学
作者
George H. John,Ron Kohavi,Karl Pfleger
出处
期刊:Elsevier eBooks
[Elsevier]
日期:1994-01-01
卷期号:: 121-129
被引量:1719
标识
DOI:10.1016/b978-1-55860-335-6.50023-4
摘要
We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high-accuracy concepts. We examine notions of relevance and irrelevance, and show that the definitions used in the machine learning literature do not adequately partition the features into useful categories of relevance. We present definitions for irrelevance and for two degrees of relevance. These definitions improve our understanding of the behavior of previous subset selection algorithms, and help define the subset of features that should be sought. The features selected should depend not only on the features and the target concept, but also on the induction algorithm. We describe a method for feature subset selection using cross-validation that is applicable to any induction algorithm, and discuss experiments conducted with ID3 and C4.5 on artificial and real datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI