可解释性
污染物
机器学习
人工智能
计算机科学
跟踪(心理语言学)
公制(单位)
分类器(UML)
环境科学
数据挖掘
生态学
工程类
运营管理
语言学
生物
哲学
作者
Shin‐Tson Wu,Zhongyao Liang,Qianlinglin Qiu
出处
期刊:ACS ES&T water
[American Chemical Society]
日期:2023-09-26
卷期号:4 (3): 1155-1165
被引量:1
标识
DOI:10.1021/acsestwater.3c00478
摘要
Trace pollutants are widely observed in aquatic ecosystems and can significantly impact human health and the environment. Accurate prediction of trace pollutants and understanding their response to environmental drivers are key to environmental management, yet these tasks remain challenging. An important reason for this challenge is that monitoring data for trace pollutants are often left-censored, leading to biased estimation and inaccurate response-driver relationships. Here we propose a novel two-stage interpretable machine learning framework applicable to left-censored trace pollutant data. The two stages include (1) a classifier to predict the presence of the pollutant and (2) a regressor to predict the pollutant concentration if present. The two stages were followed by a model interpretation to understand the contribution of drivers to the presence and concentration of the pollutant accordingly. We take the prediction of microcystin (MICX) in lakes across the United States as a case study. Applying this framework to MICX consistently improved prediction accuracy, including prediction of its occurrence and concentration regardless of the algorithms and performance metrics used. The best-performing algorithm using the two-stage framework, compared with the baseline model, improves classification performance by 48% to 290% and the regression performance by 11% to 33%, depending on the metric used to evaluate the performance. The interpretable machine learning model also successfully revealed the impacts of the most important drivers on the presence of MICX and its concentration. Our results showed the advantages of this framework, including its interpretability to understand the driver-response relationship, ability to handle nonlinearity, better prediction performance, differentiation between the underlying processes, and potential to be generalized to other pollutants. As such, we anticipate that the framework we propose will be a starting point for using state-of-the-art interpretable machine learning models for predicting trace pollutants.
科研通智能强力驱动
Strongly Powered by AbleSci AI