随机森林
支持向量机
集合(抽象数据类型)
试验装置
数据集
计算机科学
磺胺
人工智能
机器学习
化学
立体化学
程序设计语言
作者
Zijian Qin,Yao Xi,Shengde Zhang,Guiping Tu,Aixia Yan
标识
DOI:10.1021/acs.jcim.8b00876
摘要
This work reports the classification study conducted on the biggest COX-2 inhibitor data set so far. Using 2925 diverse COX-2 inhibitors collected from 168 pieces of literature, we applied machine learning methods, support vector machine (SVM) and random forest (RF), to develop 12 classification models. The best SVM and RF models resulted in MCC values of 0.73 and 0.72, respectively. The 2925 COX-2 inhibitors were reduced to a data set of 1630 molecules by removing intermediately active inhibitors, and 12 new classification models were constructed, yielding MCC values above 0.72. The best MCC value of the external test set was predicted to be 0.68 by the RF model using ECFP_4 fingerprints. Moreover, the 2925 COX-2 inhibitors were clustered into eight subsets, and the structural features of each subset were investigated. We identified substructures important for activity including halogen, carboxyl, sulfonamide, and methanesulfonyl groups, as well as the aromatic nitrogen atoms. The models developed in this study could serve as useful tools for compound screening prior to lab tests.
科研通智能强力驱动
Strongly Powered by AbleSci AI