重采样
过采样
假阳性悖论
人工智能
计算机科学
支持向量机
机器学习
真阳性率
数据集
灵敏度(控制系统)
假阳性率
分子描述符
试验装置
集合(抽象数据类型)
数据挖掘
血脑屏障
模式识别(心理学)
数量结构-活动关系
工程类
生物
神经科学
计算机网络
程序设计语言
中枢神经系统
带宽(计算)
电子工程
作者
Zhuang Wang,Hongbin Yang,Zengrui Wu,Tianduanyi Wang,Weihua Li,Yun Tang,Guixia Liu
出处
期刊:ChemMedChem
[Wiley]
日期:2018-08-15
卷期号:13 (20): 2189-2201
被引量:138
标识
DOI:10.1002/cmdc.201800533
摘要
Abstract The blood–brain barrier (BBB) as a part of absorption protects the central nervous system by separating the brain tissue from the bloodstream. In recent years, BBB permeability has become a critical issue in chemical ADMET prediction, but almost all models were built using imbalanced data sets, which caused a high false‐positive rate. Therefore, we tried to solve the problem of biased data sets and built a reliable classification model with 2358 compounds. Machine learning and resampling methods were used simultaneously for the refinement of models with both 2 D molecular descriptors and molecular fingerprints to represent the chemicals. Through a series of evaluation, we realized that resampling methods such as Synthetic Minority Oversampling Technique (SMOTE) and SMOTE+edited nearest neighbor could effectively solve the problem of imbalanced data sets and that MACCS fingerprint combined with support vector machine performed the best. After the final construction of a consensus model, the overall accuracy rate was increased to 0.966 for the final external data set. Also, the accuracy rate of the model for the test set was 0.919, with an excellent balanced capacity of 0.925 (sensitivity) to predict BBB‐positive compounds and of 0.899 (specificity) to predict BBB‐negative compounds. Compared with other BBB classification models, our models reduced the rate of false positives and were more robust in prediction of BBB‐positive as well as BBB‐negative compounds, which would be quite helpful in early drug discovery.
科研通智能强力驱动
Strongly Powered by AbleSci AI