数量结构-活动关系
稳健性(进化)
均方误差
化学毒性
回归
训练集
急性毒性
数据集
线性回归
回归分析
试验装置
机器学习
毒性
计算机科学
人工智能
统计
数学
化学
生物化学
有机化学
基因
作者
Tao Bo,Yaohui Lin,Jinglong Han,Zhineng Hao,Jingfu Liu
标识
DOI:10.1016/j.jhazmat.2023.131344
摘要
Machine learning (ML) methods provide a new opportunity to build quantitative structure-activity relationship (QSAR) models for predicting chemicals’ toxicity based on large toxicity data sets, but they are limited in insufficient model robustness due to poor data set quality for chemicals with certain structures. To address this issue and improve model robustness, we built a large data set on rat oral acute toxicity for thousands of chemicals, then used ML to filter chemicals favorable for regression models (CFRM). In comparison to chemicals not favorable for regression models (CNRM), CFRM accounted for 67% of chemicals in the original data set, and had a higher structural similarity and a smaller toxicity distribution in 2–4 log10 (mg/kg). The performance of established regression models for CFRM was greatly improved, with root-mean-square deviations (RMSE) in the range of 0.45–0.48 log10 (mg/kg). Classification models were built for CNRM using all chemicals in the original data set, and the area under receiver operating characteristic (AUROC) reached 0.75–0.76. The proposed strategy was successfully applied to a mouse oral acute data set, yielding RMSE and AUROC in the range of 0.36–0.38 log10 (mg/kg) and 0.79, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI