资源(消歧)
计算机科学
计算生物学
配体(生物化学)
化学
生物
生物化学
受体
计算机网络
作者
Janani Durairaj,Yusuf Adeshina,Zhonglin Cao,Xuejin Zhang,Vladimiras Oleinikovas,Thomas J. Duignan,Zachary D. McClure,Xavier Robin,Gabriel Studer,Daniel Kovtun,Emanuele Rossi,Guoqing Zhou,Srimukh Prasad Veccham,Clemens Isert,Yuxing Peng,Prabindh Sundareson,Mehmet Akdel,Gabriele Corso,H. Stärk,Gerardo Tauriello
标识
DOI:10.1101/2024.07.17.603955
摘要
Abstract Protein-ligand interactions (PLI) are foundational to small molecule drug design. With computational methods striving towards experimental accuracy, there is a critical demand for a well-curated and diverse PLI dataset. Existing datasets are often limited in size and diversity, and commonly used evaluation sets suffer from training information leakage, hindering the realistic assessment of method generalization capabilities. To address these shortcomings, we present PLIN-DER, the largest and most annotated dataset to date, comprising 449,383 PLI systems, each with over 500 annotations, similarity metrics at protein, pocket, interaction and ligand levels, and paired unbound ( apo ) and predicted structures. We propose an approach to generate training and evaluation splits that minimizes task-specific leakage and maximizes test set quality, and compare the resulting performance of DiffDock when retrained with different kinds of splits.
科研通智能强力驱动
Strongly Powered by AbleSci AI