A transfer Learning-Based LSTM strategy for imputing Large-Scale consecutive missing data and its application in a water quality prediction system

计算机科学缺少数据水质插补（统计学）机器学习灵活性（工程）人工智能学习迁移资源（消歧）深度学习水资源数据挖掘比例（比率）统计数学量子力学生物计算机网络物理生态学

作者

Chen Zeng,Huan Xu,Peng Jiang,Shanen Yu,Guang Lin,Igor Bychkov,Alexei E. Hmelnov,Г. М. Ружников,Ning Zhu,Zhen Liu

出处

期刊：Journal of Hydrology [Elsevier BV]
日期：2021-07-29 卷期号：602: 126573-126573 被引量：81

标识

DOI：10.1016/j.jhydrol.2021.126573

摘要

In recent years, water quality monitoring has been crucial to improve water resource protection and management. Under the relevant laws and regulations, environmental protection department agencies monitor lakes, streams, rivers, and other types of water bodies to assess water quality conditions. The valid and high-quality data generated from these monitoring activities help water resource managers understand the existing pollution situations, energy consumption problems and pollution control needs. However, there are inevitably many problems with water quality data in the real world due to human mistakes or system failures. One of the most frequently occurring issues is missing data. Although most existing studies have explored classic statistical methods or emerging machine/deep learning methods to fill gaps in data, these methods are not suitable for large-scale consecutive missing data problems. To address this issue, this paper proposes a novel algorithm called TrAdaBoost-LSTM, which integrates state-of-the-art deep learning theory through long short-term memory (LSTM) and instance-based transfer learning through TrAdaBoost. This model inherits the full advantages of the LSTM model and transfer learning technique, namely the powerful ability to capture the long-term dependencies among time series and the flexibility of leveraging the related knowledge from complete datasets to fill in large-scale consecutive missing data. A case study involving Dissolved Oxygen concentrations obtained from water quality monitoring stations is conducted to validate the effectiveness and superiority of the proposed method. The results show that the proposed TrAdaBoost-LSTM model not only improves the imputation accuracy by 15%~25% compared with that of alternative models based on the obtained performance indicators, but also provides potential ideas for similar future research.

求助该文献

最长约 10秒，即可获得该文献文件

A transfer Learning-Based LSTM strategy for imputing Large-Scale consecutive missing data and its application in a water quality prediction system

今日热心研友