生物信息学
计算机科学
药物发现
可扩展性
计算生物学
化学空间
蛋白质组
嵌入
人工智能
功能(生物学)
诱饵
药物靶点
机器学习
生物信息学
生物
基因
药理学
受体
数据库
遗传学
作者
Rohit Singh,Samuel Sledzieski,Bryan D. Bryson,Lenore Cowen,Bonnie Berger
标识
DOI:10.1073/pnas.2220778120
摘要
Sequence-based prediction of drug–target interactions has the potential to accelerate drug discovery by complementing experimental screens. Such computational prediction needs to be generalizable and scalable while remaining sensitive to subtle variations in the inputs. However, current computational techniques fail to simultaneously meet these goals, often sacrificing performance of one to achieve the others. We develop a deep learning model, ConPLex, successfully leveraging the advances in pretrained protein language models (“PLex”) and employing a protein-anchored contrastive coembedding (“Con”) to outperform state-of-the-art approaches. ConPLex achieves high accuracy, broad adaptivity to unseen data, and specificity against decoy compounds. It makes predictions of binding based on the distance between learned representations, enabling predictions at the scale of massive compound libraries and the human proteome. Experimental testing of 19 kinase-drug interaction predictions validated 12 interactions, including four with subnanomolar affinity, plus a strongly binding EPHB1 inhibitor ( K D = 1.3 nM). Furthermore, ConPLex embeddings are interpretable, which enables us to visualize the drug–target embedding space and use embeddings to characterize the function of human cell-surface proteins. We anticipate that ConPLex will facilitate efficient drug discovery by making highly sensitive in silico drug screening feasible at the genome scale. ConPLex is available open source at https://ConPLex.csail.mit.edu .
科研通智能强力驱动
Strongly Powered by AbleSci AI