机器学习
人工智能
计算机科学
支持向量机
随机森林
中间性中心性
药物数据库
聚类分析
人工神经网络
分类器(UML)
数据挖掘
中心性
药品
医学
组合数学
精神科
数学
作者
Cristiano Galletti,Joaquim Aguirre‐Plans,Baldo Oliva,Narcís Fernández‐Fuentes
标识
DOI:10.3389/fbinf.2022.906644
摘要
Drug discovery attrition rates, particularly at advanced clinical trial stages, are high because of unexpected adverse drug reactions (ADR) elicited by novel drug candidates. Predicting undesirable ADRs produced by the modulation of certain protein targets would contribute to developing safer drugs, thereby reducing economic losses associated with high attrition rates. As opposed to the more traditional drug-centric approach, we propose a target-centric approach to predict associations between protein targets and ADRs. The implementation of the predictor is based on a machine learning classifier that integrates a set of eight independent network-based features. These include a network diffusion-based score, identification of protein modules based on network clustering algorithms, functional similarity among proteins, network distance to proteins that are part of safety panels used in preclinical drug development, set of network descriptors in the form of degree and betweenness centrality measurements, and conservation. This diverse set of descriptors were used to generate predictors based on different machine learning classifiers ranging from specific models for individual ADR to higher levels of abstraction as per MEDDRA hierarchy such as system organ class. The results obtained from the different machine-learning classifiers, namely, support vector machine, random forest, and neural network were further analyzed as a meta-predictor exploiting three different voting systems, namely, jury vote , consensus vote , and red flag , obtaining different models for each of the ADRs in analysis. The level of accuracy of the predictors justifies the identification of problematic protein targets both at the level of individual ADR as well as a set of related ADRs grouped in common system organ classes. As an example, the prediction of ventricular tachycardia achieved an accuracy and precision of 0.83 and 0.90, respectively, and a Matthew correlation coefficient of 0.70. We believe that this approach is a good complement to the existing methodologies devised to foresee potential liabilities in preclinical drug discovery. The method is available through the DocTOR utility at GitHub ( https://github.com/cristian931/DocTOR ).
科研通智能强力驱动
Strongly Powered by AbleSci AI