计算机科学
交叉验证
选型
选择(遗传算法)
模型验证
集合(抽象数据类型)
数据挖掘
数据集
数据验证
机器学习
训练集
任务(项目管理)
引导聚合
人工智能
数据库
管理
数据科学
经济
程序设计语言
作者
Rafael Savvides,Jarmo Mäkelä,Kai Puolamäki
摘要
Abstract Model selection is one of the most central tasks in supervised learning. Validation set methods are the standard way to accomplish this task: models are trained on training data, and the model with the smallest loss on the validation data is selected. However, it is generally not obvious how much validation data is required to make a reliable selection, which is essential when labeled data are scarce or expensive. We propose a bootstrap‐based algorithm, bootstrap validation (BSV), that uses the bootstrap to adjust the validation set size and to find the best‐performing model within a tolerance parameter specified by the user. We find that BSV works well in practice and can be used as a drop‐in replacement for validation set methods or k ‐fold cross‐validation. The main advantage of BSV is that less validation data is typically needed, so more data can be used to train the model, resulting in better approximations and efficient use of validation data.
科研通智能强力驱动
Strongly Powered by AbleSci AI