人工智能
模式识别(心理学)
随机森林
分子描述符
交叉验证
计算机科学
计算生物学
生物系统
化学
生物
机器学习
数量结构-活动关系
作者
Srijit Seal,Hongbin Yang,Luis Vollmers,Andreas Bender
标识
DOI:10.1021/acs.chemrestox.0c00303
摘要
Cell morphology features, such as those from the Cell Painting assay, can be generated at relatively low costs and represent versatile biological descriptors of a system and thereby compound response. In this study, we explored cell morphology descriptors and molecular fingerprints, separately and in combination, for the prediction of cytotoxicity- and proliferation-related in vitro assay endpoints. We selected 135 compounds from the MoleculeNet ToxCast benchmark data set which were annotated with Cell Painting readouts, where the relatively small size of the data set is due to the overlap of required annotations. We trained Random Forest classification models using nested cross-validation and Cell Painting descriptors, Morgan and ErG fingerprints, and their combinations. While using leave-one-cluster-out cross-validation (with clusters based on physicochemical descriptors), models using Cell Painting descriptors achieved higher average performance over all assays (Balanced Accuracy of 0.65, Matthews Correlation Coefficient of 0.28, and AUC-ROC of 0.71) compared to models using ErG fingerprints (BA 0.55, MCC 0.09, and AUC-ROC 0.60) and Morgan fingerprints alone (BA 0.54, MCC 0.06, and AUC-ROC 0.56). While using random shuffle splits, the combination of Cell Painting descriptors with ErG and Morgan fingerprints further improved balanced accuracy on average by 8.9% (in 9 out of 12 assays) and 23.4% (in 8 out of 12 assays) compared to using only ErG and Morgan fingerprints, respectively. Regarding feature importance, Cell Painting descriptors related to nuclei texture, granularity of cells, and cytoplasm as well as cell neighbors and radial distributions were identified to be most contributing, which is plausible given the endpoint considered. We conclude that cell morphological descriptors contain complementary information to molecular fingerprints which can be used to improve the performance of predictive cytotoxicity models, in particular in areas of novel structural space.
科研通智能强力驱动
Strongly Powered by AbleSci AI