Python(编程语言)
计算机科学
R包
程序设计语言
作者
Jochen Sieg,Christian Feldmann,Jennifer Hemmerich,Conrad Stork,Frederik Sandfort,Philipp Eiden,Miriam Mathea
标识
DOI:10.26434/chemrxiv-2024-kd11b
摘要
The open-source package scikit-learn provides various machine learning algorithms and data processing tools, including the Pipeline class, which allows users to prepend custom data transformation steps to the machine learning model. We introduce the MolPipeline package, which extends this concept to chemoinformatics by wrapping default functionalities of RDKit, such as reading and writing SMILES strings or calculating molecular descriptors from a molecule object. We aimed to build an easy-to-use Python package to create completely automated end-to-end pipelines that scale to large data sets. Particular emphasis was put on handling erroneous instances, where resolution would require manual intervention in default pipelines. In addition, we included common cheminformatics tasks, like scaffold splits and molecular standardization, natively in the pipeline framework and adaptable for the needs of various projects.
科研通智能强力驱动
Strongly Powered by AbleSci AI