A Two-Stage Interpretable Machine Learning Framework for Accurate Prediction of Trace Pollutants: With an Application to Microcystin

可解释性污染物机器学习人工智能计算机科学跟踪（心理语言学）公制（单位）分类器（UML）环境科学数据挖掘生态学工程类运营管理语言学生物哲学

作者

Shin‐Tson Wu,Zhongyao Liang,Qianlinglin Qiu

出处

期刊：ACS ES&T water [American Chemical Society]
日期：2023-09-26 卷期号：4 (3): 1155-1165 被引量：1

标识

摘要

Trace pollutants are widely observed in aquatic ecosystems and can significantly impact human health and the environment. Accurate prediction of trace pollutants and understanding their response to environmental drivers are key to environmental management, yet these tasks remain challenging. An important reason for this challenge is that monitoring data for trace pollutants are often left-censored, leading to biased estimation and inaccurate response-driver relationships. Here we propose a novel two-stage interpretable machine learning framework applicable to left-censored trace pollutant data. The two stages include (1) a classifier to predict the presence of the pollutant and (2) a regressor to predict the pollutant concentration if present. The two stages were followed by a model interpretation to understand the contribution of drivers to the presence and concentration of the pollutant accordingly. We take the prediction of microcystin (MICX) in lakes across the United States as a case study. Applying this framework to MICX consistently improved prediction accuracy, including prediction of its occurrence and concentration regardless of the algorithms and performance metrics used. The best-performing algorithm using the two-stage framework, compared with the baseline model, improves classification performance by 48% to 290% and the regression performance by 11% to 33%, depending on the metric used to evaluate the performance. The interpretable machine learning model also successfully revealed the impacts of the most important drivers on the presence of MICX and its concentration. Our results showed the advantages of this framework, including its interpretability to understand the driver-response relationship, ability to handle nonlinearity, better prediction performance, differentiation between the underlying processes, and potential to be generalized to other pollutants. As such, we anticipate that the framework we propose will be a starting point for using state-of-the-art interpretable machine learning models for predicting trace pollutants.

求助该文献

最长约 10秒，即可获得该文献文件

A Two-Stage Interpretable Machine Learning Framework for Accurate Prediction of Trace Pollutants: With an Application to Microcystin

今日热心研友