Towards better process management in wastewater treatment plants: Process analytics based on SHAP values for tree-based machine learning methods

过度拟合 均方误差 流出物 过程(计算) 随机森林 树(集合论) 废水 人工智能 计算机科学 机器学习 数据挖掘 人工神经网络 环境科学 统计 数学 环境工程 数学分析 操作系统
作者
Dong Wang,Sven Thunéll,Ulrika Lindberg,Lili Jiang,Johan Trygg,Mats Tysklind
出处
期刊:Journal of Environmental Management [Elsevier BV]
卷期号:301: 113941-113941 被引量:134
标识
DOI:10.1016/j.jenvman.2021.113941
摘要

Understanding the mechanisms of pollutant removal in Wastewater Treatment Plants (WWTPs) is crucial for controlling effluent quality efficiently. However, the numerous treatment units, operational factors, and the underlying interactions between these units and factors usually obfuscate the comprehensive and precise understanding of the processes. We have previously proposed a machine learning (ML) framework to uncover complex cause-and-effect relationships in WWTPs. However, only one interpretable ML model, Random forest (RF), was studied and the interpretation method was not granular enough to reveal very detailed relationships between operational factors and effluent parameters. Thus, in this paper, we present an upgraded framework involving three interpretable tree-based models (RF, XGboost and LightGBM), three metrics (R2, Root mean squared error (RMSE), and Mean absolute error (MAE)) and a more advanced interpretation system SHapley Additive exPlanations (SHAP). Details of the framework are provided along with a demonstration of its practical applicability based on a case study of the Umeå WWTP in Sweden. Results show that, for both labels TSSe (Total suspended solids in effluent) and PO4e (Phosphate in effluent), the XGBoost models are optimal whereas the RF models are the least optimal, due to overfitting and polarized fitting. This study has yielded multiple new and significant findings with respect to the control of TSSe and PO4e in the Umeå WWTP and other similarly configured WWTPs. Additionally, this study has produced two important generic findings relating to ML applications for WWTPs (or even other process industries) in terms of cause-and-effect investigations. First, the model comparison should be carried out from multiple perspectives to ensure that underlying details are fully revealed and examined. Second, using a precise, robust, and granular (feature attribution available for individual instances) explanation method can bring extra insight into both model comparison and model interpretation. SHAP is recommended as we found it to be of great value in this study.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
大个应助阿治采纳,获得10
刚刚
尔尔完成签到,获得积分10
2秒前
徐徐徐完成签到,获得积分10
4秒前
科研废物完成签到,获得积分10
4秒前
LEO完成签到 ,获得积分10
7秒前
荀万声完成签到,获得积分10
8秒前
zyf发布了新的文献求助10
9秒前
10秒前
CFC12发布了新的文献求助10
16秒前
17秒前
22秒前
大渡河发布了新的文献求助10
23秒前
24秒前
TT完成签到,获得积分10
24秒前
危机的道天完成签到 ,获得积分10
25秒前
鲤鱼绿旋完成签到,获得积分10
26秒前
26秒前
阿治发布了新的文献求助10
27秒前
Andy_Cheung应助睡个好觉采纳,获得10
27秒前
领导范儿应助啊婧采纳,获得10
28秒前
28秒前
鲤鱼绿旋发布了新的文献求助10
29秒前
lalaland发布了新的文献求助10
30秒前
无限鲜花发布了新的文献求助10
32秒前
正月的大雪完成签到,获得积分10
33秒前
小雨完成签到,获得积分10
33秒前
栀梦发布了新的文献求助10
34秒前
Orange应助爱学习的小白采纳,获得10
35秒前
科研通AI5应助123采纳,获得10
35秒前
37秒前
跳跃的邪欢完成签到,获得积分10
37秒前
38秒前
38秒前
39秒前
务实的奇迹完成签到 ,获得积分10
40秒前
11完成签到 ,获得积分10
40秒前
无限鲜花完成签到,获得积分10
42秒前
yunrtghdfgbdf发布了新的文献求助30
43秒前
木木发布了新的文献求助10
43秒前
44秒前
高分求助中
Generic and Innovator Drugs: A Guide to Fda Approval Requirements 500
IZELTABART TAPATANSINE 500
Where and how to use plate heat exchangers 400
Seven new species of the Palaearctic Lauxaniidae and Asteiidae (Diptera) 400
离子交换膜面电阻的测定方法学 300
Handbook of Laboratory Animal Science 300
Fundamentals of Medical Device Regulations, Fifth Edition(e-book) 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3707920
求助须知:如何正确求助?哪些是违规求助? 3256447
关于积分的说明 9900200
捐赠科研通 2969011
什么是DOI,文献DOI怎么找? 1628271
邀请新用户注册赠送积分活动 772038
科研通“疑难数据库(出版商)”最低求助积分说明 743611