Critical Assessment of Artificial Intelligence Methods for Prediction of hERG Channel Inhibition in the “Big Data” Era

赫尔格大数据人工智能计算机科学频道（广播）计算生物学机器学习数据挖掘内科学生物医学钾通道计算机网络

作者

Vishal B. Siramshetty,Dac-Trung Nguyen,Natalia J. Martinez,Noel Southall,Anton Simeonov,Alexey V. Zakharov

出处

期刊：Journal of Chemical Information and Modeling [American Chemical Society]
日期：2020-12-01 卷期号：60 (12): 6007-6019 被引量：13

链接

figshare.comdoi.org

标识

DOI：10.1021/acs.jcim.0c00884

摘要

The rise of novel artificial intelligence (AI) methods necessitates their benchmarking against classical machine learning for a typical drug-discovery project. Inhibition of the potassium ion channel, whose alpha subunit is encoded by the human ether-a-go-go-related gene (hERG), leads to a prolonged QT interval of the cardiac action potential and is a significant safety pharmacology target for the development of new medicines. Several computational approaches have been employed to develop prediction models for the assessment of hERG liabilities of small molecules including recent work using deep learning methods. Here, we perform a comprehensive comparison of hERG effect prediction models based on classical approaches (random forests and gradient boosting) and modern AI methods [deep neural networks (DNNs) and recurrent neural networks (RNNs)]. The training set (∼9000 compounds) was compiled by integrating the hERG bioactivity data from the ChEMBL database with experimental data generated from an in-house, high-throughput thallium flux assay. We utilized different molecular descriptors including the latent descriptors, which are real-value continuous vectors derived from chemical autoencoders trained on a large chemical space (>1.5 million compounds). The models were prospectively validated on ∼840 in-house compounds screened in the same thallium flux assay. The best results were obtained with the XGBoost method and RDKit descriptors. The comparison of models based only on latent descriptors revealed that the DNNs performed significantly better than the classical methods. The RNNs that operate on SMILES provided the highest model sensitivity. The best models were merged into a consensus model that offered superior performance compared to reference models from academic and commercial domains. Furthermore, we shed light on the potential of AI methods to exploit the big data in chemistry and generate novel chemical representations useful in predictive modeling and tailoring a new chemical space.

求助该文献

Critical Assessment of Artificial Intelligence Methods for Prediction of hERG Channel Inhibition in the “Big Data” Era

今日热心研友