化学空间
数量结构-活动关系
集合(抽象数据类型)
计算机科学
训练集
人工智能
功能(生物学)
均方误差
机器学习
试验装置
数据挖掘
化学
药物发现
数学
统计
生物
进化生物学
程序设计语言
生物化学
作者
Robert P. Sheridan,J. Chris Culberson,Elizabeth Joshi,Matthew Tudor,Prabha Karnachi
标识
DOI:10.1021/acs.jcim.2c00699
摘要
As with many other institutions, our company maintains many quantitative structure-activity relationship (QSAR) models of absorption, distribution, metabolism, excretion, and toxicity (ADMET) end points and updates the models regularly. We recently examined version-to-version predictivity for these models over a period of 10 years. In this approach we monitor the goodness of prediction of new molecules relative to the training set of model version V before they are incorporated in the updated model V+1. Using a cell-based permeability assay (Papp) as an example, we illustrate how the QSAR models made from this data are generally predictive and can be utilized to enrich chemical designs and synthesis. Despite the obvious utility of these models, we turned up unexpected behavior in Papp and other ADMET activities for which the explanation is not obvious. One such behavior is that the apparent predictivity of the models as measured by root-mean-square-error can vary greatly from version to version and is sometimes very poor. One intuitively appealing explanation is that the observed activities of the new molecules fall outside the bulk of activities in the training set. Alternatively, one may think that the new molecules are exploring different regions of chemical space than the training set. However, the real explanation has to do with activity cliffs. If the observed activities of the new molecules are different than expected based on similar molecules in the training set, the predictions will be less accurate. This is true for all our ADMET end points.
科研通智能强力驱动
Strongly Powered by AbleSci AI