化学
生物信息学
分子描述符
化学空间
药物数据库
随机森林
人工智能
数量结构-活动关系
计算生物学
分子动力学
机器学习
计算机科学
化学信息学
编码
支持向量机
药物发现
训练集
公共化学
生物系统
化学
药品
生物
生物化学
基因
计算化学
药理学
作者
Carmen Esposito,Shuzhe Wang,Udo E. W. Lange,Frank Oellien,Sereina Riniker
标识
DOI:10.1021/acs.jcim.0c00525
摘要
The efflux transporter P-glycoprotein (P-gp) is responsible for the extrusion of a wide variety of molecules, including drug molecules, from the cell. Therefore, P-gp-mediated efflux transport limits the bioavailability of drugs. To identify potential P-gp substrates early in the drug discovery process, in silico models have been developed based on structural and physicochemical descriptors. In this study, we investigate the use of molecular dynamics fingerprints (MDFPs) as an orthogonal descriptor for the training of machine learning (ML) models to classify small molecules into substrates and nonsubstrates of P-gp. MDFPs encode the information from short MD simulations of the molecules in different environments (water, membrane, or protein pocket). The performance of the MDFPs, evaluated on both an in-house dataset (3930 compounds) and a public dataset from ChEMBL (1114 compounds), is compared to that of commonly used 2D molecular descriptors, including structure-based and property-based descriptors. We find that all tested classifiers interpolate well, achieving high accuracy on chemically diverse subsets. However, by challenging the models with external validation and prospective analysis, we show that only tree-based ML models trained on MDFPs or property-based descriptors generalize well to regions of the chemical space not covered by the training set.
科研通智能强力驱动
Strongly Powered by AbleSci AI