交叉验证
模型验证
工作流程
计算机科学
集合(抽象数据类型)
试验装置
数据挖掘
比例(比率)
数据验证
差异(会计)
机器学习
人工智能
数据库
程序设计语言
会计
量子力学
业务
数据科学
物理
作者
Martin Gütlein,Christoph Helma,Andreas Karwath,Stefan Krämer
标识
DOI:10.1002/minf.201200134
摘要
(Q)SAR model validation is essential to ensure the quality of inferred models and to indicate future model predictivity on unseen compounds. Proper validation is also one of the requirements of regulatory authorities in order to accept the (Q)SAR model, and to approve its use in real world scenarios as alternative testing method. However, at the same time, the question of how to validate a (Q)SAR model, in particular whether to employ variants of cross-validation or external test set validation, is still under discussion. In this paper, we empirically compare a k-fold cross-validation with external test set validation. To this end we introduce a workflow allowing to realistically simulate the common problem setting of building predictive models for relatively small datasets. The workflow allows to apply the built and validated models on large amounts of unseen data, and to compare the performance of the different validation approaches. The experimental results indicate that cross-validation produces higher performant (Q)SAR models than external test set validation, reduces the variance of the results, while at the same time underestimates the performance on unseen compounds. The experimental results reported in this paper suggest that, contrary to current conception in the community, cross-validation may play a significant role in evaluating the predictivity of (Q)SAR models.
科研通智能强力驱动
Strongly Powered by AbleSci AI