数量结构-活动关系
相似性(几何)
支持向量机
分子描述符
偏最小二乘回归
人工智能
计算机科学
试验装置
超参数
模式识别(心理学)
随机森林
集合(抽象数据类型)
线性回归
数据挖掘
机器学习
数学
图像(数学)
程序设计语言
作者
Arkaprava Banerjee,Kunal Roy
标识
DOI:10.1021/acs.chemrestox.2c00374
摘要
The novel quantitative read-across structure–activity relationship (q-RASAR) approach uses read-across-derived similarity functions in the quantitative structure–activity relationship (QSAR) modeling framework in a unique way for supervised model generation. The aim of this study is to explore how this workflow enhances the external (test set) prediction quality of conventional QSAR models by the incorporation of some novel similarity-based functions as additional descriptors using the same level of chemical information. To establish this, five different toxicity data sets, for which QSAR models were reported previously, have been considered in the q-RASAR modeling exercise, which uses chemical similarity-derived measures. The identical sets of chemical features along with the same compositions of training and test sets as reported previously were used in the present analysis for ease of comparison. The RASAR descriptors were calculated based on a chosen similarity measure with the default setting of relevant hyperparameter(s) and were then clubbed with the original structural and physicochemical descriptors, and the number of selected features was further optimized by employing a grid search technique applied on the respective training sets. These features were then used to develop multiple linear regression (MLR) q-RASAR models that show enhanced predictivity as compared to the QSAR models developed previously. Moreover, various other ML algorithms like support vector machine (SVM), linear SVM, random forest, partial least squares, and ridge regression were also employed using the same feature combinations as used in the MLR models to compare the prediction qualities. The q-RASAR models for five different data sets possess at least one of the RASAR descriptors, RA function, gm, and average similarity, suggesting that these are important determinants of similarities that contribute to the development of predictive q-RASAR models, as also evident from the SHAP analysis of the models.
科研通智能强力驱动
Strongly Powered by AbleSci AI