计算机科学
比例(比率)
人工智能
机器学习
计算生物学
生物
地理
地图学
作者
Kyle Swanson,Parker Walther,Jeremy Leitz,Souhrid Mukherjee,Joseph C. Wu,Rabindra V. Shivnaraine,James Zou
出处
期刊:CERN European Organization for Nuclear Research - Zenodo
日期:2023-12-13
标识
DOI:10.5281/zenodo.10372418
摘要
This repository contains data and models used in the following paper. Swanson, K., Walther, P., Leitz, J., Mukherjee, S., Wu, J. C., Shivnaraine, R. V., & Zou, J. ADMET-AI: A machine learning ADMET platform for evaluation of large-scale chemical libraries. In review. The data and models are meant to be used with the ADMET-AI code, which runs the ADMET-AI web server at admet.ai.greenstonebio.com. The data.zip file has the following structure. data drugbank: Contains files with drugs from the DrugBank that have received regulatory approval. drugbank_approved.csv contains the full set of approved drugs along with ADMET-AI predictions, while the other files contain subsets of these molecules used for testing the speed of ADMET prediction tools. tdc_admet_all: Contains the data (.csv files) and RDKit features (.npz files) for all 41 single-task ADMET datasets from the Therapeutics Data Commons (TDC). tdc_admet_multitask: Contains the data (.csv files) and RDKit features (.npz files) for the two multi-task datasets (one regression and one classification) constructed by combining the tdc_admet_all datasets. tdc_admet_all.csv: A CSV file containing all 41 ADMET datasets from tdc_admet_all. This can be used to easily look up all ADMET properties for a given molecule in the TDC. tdc_admet_group: Contains the data (.csv files) and RDKit features (.npz files) for the 22 TDC ADMET Benchmark Group datasets with five splits per dataset. tdc_admet_group_raw: Contains the raw data (.csv files) used to construct the five splits per dataset in tdc_admet_group. The models.zip file has the following structure. Note that the ADMET-AI website and Python package use the multi-task Chemprop-RDKit models below. models tdc_admet_all: Contains Chemprop and Chemprop-RDKit models trained on all 41 single-task TDC ADMET datasets. tdc_admet_all_multitask: Contains Chemprop and Chemprop-RDKit models trained on the two multi-task TDC ADMET datasets (one regression and one classification). tdc_admet_group: Contains Chemprop and Chemprop-RDKit models trained on the 22 TDC ADMET Benchmark Group datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI