交叉验证
朴素贝叶斯分类器
选型
计算机科学
分类器(UML)
选择(遗传算法)
特征选择
人工智能
计算
统计
数据挖掘
机器学习
模式识别(心理学)
数学
支持向量机
算法
出处
期刊:International Joint Conference on Artificial Intelligence
日期:1995-08-20
卷期号:2: 1137-1143
被引量:9899
摘要
We review accuracy estimation methods and compare the two most common methods crossvalidation and bootstrap. Recent experimental results on artificial data and theoretical re cults in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment--over half a million runs of C4.5 and a Naive-Bayes algorithm--to estimate the effects of different parameters on these algrithms on real-world datasets. For crossvalidation we vary the number of folds and whether the folds are stratified or not, for bootstrap, we vary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, The best method to use for model selection is ten fold stratified cross validation even if computation power allows using more folds.
科研通智能强力驱动
Strongly Powered by AbleSci AI