交叉验证
计算机科学
模型验证
理想(伦理)
质量(理念)
数据挖掘
价值(数学)
折叠(高阶函数)
人工智能
机器学习
数据科学
认识论
哲学
程序设计语言
作者
Sanjay Yadav,Sanyam Shukla
出处
期刊:International Conference on Advanced Computing
日期:2016-02-01
被引量:231
摘要
While training a model with data from a dataset, we have to think of an ideal way to do so. The training should be done in such a way that while the model has enough instances to train on, they should not over-fit the model and at the same time, it must be considered that if there are not enough instances to train on, the model would not be trained properly and would give poor results when used for testing. Accuracy is important when it comes to classification and one must always strive to achieve the highest accuracy, provided there is not trade off with inexcusable time. While working on small datasets, the ideal choices are k-fold cross-validation with large value of k (but smaller than number of instances) or leave-one-out cross-validation whereas while working on colossal datasets, the first thought is to use holdout validation, in general. This article studies the differences between the two validation schemes, analyzes the possibility of using k-fold cross-validation over hold-out validation even on large datasets. Experimentation was performed on four large datasets and results show that till a certain threshold, k-fold cross-validation with varying value of k with respect to number of instances can indeed be used over hold-out validation for quality classification.
科研通智能强力驱动
Strongly Powered by AbleSci AI