缺少数据
异常检测
插补(统计学)
计算机科学
离群值
数据挖掘
协方差
多元统计
超参数
模式识别(心理学)
人工智能
统计
机器学习
数学
作者
Kisan Sarda,Amol Yerudkar,Carmen Del Vecchio
标识
DOI:10.1109/med59994.2023.10185791
摘要
With the increasing interconnectivity of cyber-physical systems (CPSs) in various fields, such as manufacturing plants, power plants, and smart networked systems, large amounts of multivariate data are generated through sensors and actuators, also other data sources such as measurements and images. This paper focuses on the anomaly detection (AD) problem, also known as fault detection or outlier detection, depending on the type of dataset, which involves identifying anomalous values in the dataset using analytical methods. However, datasets often contain missing values, which can lead to incorrect outcomes and affect the availability of anomalous samples that are fewer in amount, making incomplete datasets. Therefore, a generalized AD method is proposed for incomplete datasets, which involves two steps: data imputation (DI) to obtain complete datasets using GAN and later AD for the complete datasets. While statistical-based imputation methods are commonly used, they do not consider data distribution for datasets with anomalous samples. The capabilities of GANbased DI are tested under different hyperparameter settings and percentages of missing values. The AD problem is then addressed using seven unsupervised anomaly detection methods on six different datasets, including a real dataset from a steel manufacturing plant in Italy. Each dataset is analyzed to determine which DI and AD method combination performs the best. The results show that GAN-imputed data provides the best DI performance, while the reweighted minimum covariance determinant (RMCD) method offers the overall best AD results combined with GAN.
科研通智能强力驱动
Strongly Powered by AbleSci AI