异常检测
数据清理
数据挖掘
计算机科学
鉴定(生物学)
数据集
异常(物理)
离群值
全球定位系统
数据质量
动态数据
集合(抽象数据类型)
人工智能
工程类
公制(单位)
运营管理
植物
物理
生物
凝聚态物理
电信
程序设计语言
作者
Kang Yang,Youliang Ding,Huachen Jiang,Hanwei Zhao,Gan Luo
摘要
Data cleansing is an essential approach for improving data quality. Therefore, it is the key to avoiding the false alarm of the monitoring system due to the anomaly of the data itself. Data cleansing consists of two parts: anomaly identification and anomaly repair. However, current research on data cleansing has mainly focused on anomaly identification and lacks efficient data repair methods. The key to data repair lies in sensor correlation models based on mapping relationships between sensors. To obtain a good inter-sensor relationship model, it is first necessary to exclude anomalous data from the training data set used for modeling. Therefore, a two-stage data cleansing framework for collaborative multi-sensor repair is proposed. First, based on the analysis of anomalous features of GPS data, a bidirectional long- and short-term memory (Bi-LSTM) neural network model is adopted for data anomalies classification and localization. As a result, the data segment to be repaired is determined. Then, on the basis of all sensor data in the time range of the day before the target repair data segment, the data set for data repair is constructed by excluding the anomaly data segments in the data set with the help of the above anomaly identification results. Then, a conditional generation adversarial network (CGAN) is proposed to achieve data repair. Experimental validation shows that the two-stage data cleansing method of identification followed by repair can accurately identify and repair GPS anomalies. Finally, several factors affecting the repair effect are discussed.
科研通智能强力驱动
Strongly Powered by AbleSci AI