污染物
毒性
环境科学
环境化学
急性毒性
吞吐量
水污染物
人工智能
机器学习
计算机科学
化学
有机化学
电信
无线
作者
Shuo Chen,Tengjiao Fan,Ting Ren,Na Zhang,Lijiao Zhao,Rugang Zhong,Guohui Sun
标识
DOI:10.1016/j.jhazmat.2024.136295
摘要
This study utilized available oral acute toxicity data in Rat and Mouse for polychlorinated persistent organic pollutants (PC-POPs) to construct data fusion-driven machine learning (ML) global models. Based on atom-centered fragments (ACFs), the collected high-throughput data overcame the applicability limitations, enabling accurate toxicity prediction for a wide range of PC-POPs series compounds using only single models. The data variances in the Rat training and test sets were 1.52 and 1.34, respectively, while for the Mouse, the values were 1.48 and 1.36, respectively. Genetic algorithm (GA) was used to build multiple linear regression (MLR) models and pre-screen descriptors, addressing the "black-box" problem prevalent in ML and enhancing model interpretability. The best ML models for Rat and Mouse achieved approximately 90 % prediction reliability for over 100,000 true untested compounds. Ultimately, a warning list of highly toxic compounds for eight categories of polychlorinated atom-centered fragments (PCACFs) was generated based on the prediction results. The analysis of descriptors revealed that dioxin analogs generally exhibited higher toxicity, because the heteroatoms and ring systems increased structural complexity and formed larger conjugated systems, contributing to greater oral acute toxicity. The present study provides valuable insights for guiding the subsequent in vivo tests, environmental risk assessment and the improvement of global governance system of pollutants.
科研通智能强力驱动
Strongly Powered by AbleSci AI