计算机科学
机器学习
人工智能
卷积神经网络
公制(单位)
药物发现
深度学习
匹配(统计)
一次性
图形
虚拟筛选
弹丸
领域(数学分析)
化学信息学
合成数据
数据挖掘
理论计算机科学
生物信息学
机械工程
数学分析
运营管理
统计
化学
数学
有机化学
工程类
经济
生物
作者
Daniel Vella,Jean-Paul Ebejer
标识
DOI:10.1021/acs.jcim.2c00779
摘要
The discovery of new hits through ligand-based virtual screening in drug discovery is essentially a low-data problem, as data acquisition is both difficult and expensive. The requirement for large amounts of training data hinders the application of conventional machine learning techniques to this problem domain. This work explores few-shot machine learning for hit discovery and lead optimization. We build on the state-of-the-art and introduce two new metric-based meta-learning techniques, Prototypical and Relation Networks, to this problem domain. We also explore using different embeddings, namely, extended-connectivity fingerprints (ECFP) and embeddings generated through graph convolutional networks (GCN), as inputs to neural networks for classification. This study shows that learned embeddings through GCNs consistently perform better than extended-connectivity fingerprints for toxicity and LBVS experiments. We conclude that the effectiveness of few-shot learning is highly dependent on the nature of the data. Few-shot learning models struggle to perform consistently on MUV and DUD-E data, in which the active compounds are structurally distinct. However, on Tox21 data, the few-shot models perform well, and we find that Prototypical Networks outperform the state-of-the-art, which is based on the Matching Networks architecture. Additionally, training these networks is substantially faster (up to 190%) and therefore takes a fraction of the time to train for comparable, or better, results.
科研通智能强力驱动
Strongly Powered by AbleSci AI