数量结构-活动关系
随机森林
梯度升压
化学
分子描述符
急性毒性
生物系统
毒性
机器学习
计算机科学
有机化学
立体化学
生物
作者
Shuang Wu,Shixin Li,Jing Qiu,Hai-Ming Zhao,Yan-Wen Li,Nai-Xian Feng,Bailin Liu,Quan-Ying Cai,Lei Xiang,Ce-Hui Mo,Qing X. Li
标识
DOI:10.1021/acs.est.4c03966
摘要
Acute oral toxicity is currently not available for most polycyclic aromatic hydrocarbons (PAHs), especially their derivatives, because it is cost-prohibitive to experimentally determine all of them. Here, quantitative structure–activity relationship (QSAR) models using machine learning (ML) for predicting the toxicity of PAH derivatives were developed, based on oral toxicity data points of 788 individual substances of rats. Both the individual ML algorithm gradient boosting regression trees (GBRT) and the stacking ML algorithm (extreme gradient boosting + GBRT + random forest regression) provided the best prediction results with satisfactory determination coefficients for both cross-validation and the test set. It was found that those PAH derivatives with fewer polar hydrogens, more large-sized atoms, more branches, and lower polarizability have higher toxicity. Software based on the optimal ML-QSAR model was successfully developed to expand the application potential of the developed model, obtaining reliable prediction of pLD50 values and reference doses for 6893 external PAH derivatives. Among these chemicals, 472 were identified as moderately or highly toxic; 10 out of them had clear environment detection or use records. The findings provide valuable insights into the toxicity of PAHs and their derivatives, offering a standard platform for effectively evaluating chemical toxicity using ML-QSAR models.
科研通智能强力驱动
Strongly Powered by AbleSci AI