可解释性
特征选择
机器学习
人工神经网络
人工智能
计算机科学
特征(语言学)
生命周期评估
选择(遗传算法)
分子描述符
数据挖掘
滤波器(信号处理)
重采样
欧几里德距离
数量结构-活动关系
哲学
语言学
生产(经济)
经济
计算机视觉
宏观经济学
作者
Ye Sun,Xiuheng Wang,Nanqi Ren,Yanbiao Liu,Shijie You
标识
DOI:10.1021/acs.est.2c04945
摘要
Machine learning (ML) provides an efficient manner for rapid prediction of the life-cycle environmental impacts of chemicals, but challenges remain due to low prediction accuracy and poor interpretability of the models. To address these issues, we focused on data processing by using a mutual information-permutation importance (MI-PI) feature selection method to filter out irrelevant molecular descriptors from the input data, which improved the model interpretability by preserving the physicochemical meanings of original molecular descriptors without generation of new variables. We also applied a weighted Euclidean distance method to mine the data most relevant to the predicted targets by quantifying the contribution of each feature, thereby the prediction accuracy was improved. On the basis of above data processing, we developed artificial neural network (ANN) models for predicting the life-cycle environmental impacts of chemicals with R2 values of 0.81, 0.81, 0.84, 0.75, 0.73, and 0.86 for global warming, human health, metal depletion, freshwater ecotoxicity, particulate matter formation, and terrestrial acidification, respectively. The ML models were interpreted using the Shapley additive explanation method by quantifying the contribution of each input molecular descriptor to environmental impact categories. This work suggests that the combination of feature selection by MI-PI and source data selection based on weighted Euclidean distance has a promising potential to improve the accuracy and interpretability of the models for predicting the life-cycle environmental impacts of chemicals.
科研通智能强力驱动
Strongly Powered by AbleSci AI