厕所
数量结构-活动关系
适用范围
交叉验证
试验装置
集合(抽象数据类型)
预测值
人工智能
分子描述符
机器学习
数学
计算机科学
医学
内科学
程序设计语言
作者
Alexander Golbraikh,Alexander Tropsha
标识
DOI:10.1016/s1093-3263(01)00123-1
摘要
Validation is a crucial aspect of any quantitative structure–activity relationship (QSAR) modeling. This paper examines one of the most popular validation criteria, leave-one-out cross-validated R2 (LOO q2). Often, a high value of this statistical characteristic (q2>0.5) is considered as a proof of the high predictive ability of the model. In this paper, we show that this assumption is generally incorrect. In the case of 3D QSAR, the lack of the correlation between the high LOO q2 and the high predictive ability of a QSAR model has been established earlier [Pharm. Acta Helv. 70 (1995) 149; J. Chemomet. 10 (1996) 95; J. Med. Chem. 41 (1998) 2553]. In this paper, we use two-dimensional (2D) molecular descriptors and k nearest neighbors (kNN) QSAR method for the analysis of several datasets. No correlation between the values of q2 for the training set and predictive ability for the test set was found for any of the datasets. Thus, the high value of LOO q2 appears to be the necessary but not the sufficient condition for the model to have a high predictive power. We argue that this is the general property of QSAR models developed using LOO cross-validation. We emphasize that the external validation is the only way to establish a reliable QSAR model. We formulate a set of criteria for evaluation of predictive ability of QSAR models.
科研通智能强力驱动
Strongly Powered by AbleSci AI