Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation

缺少数据插补（统计学）计算机科学数据挖掘机器学习

作者

A. H. Alamoodi,B. B. Zaidan,A. A. Zaidan,O. S. Albahri,Juliana Chen,M. A. Chyad,Salem Garfan,Ahmed Marwan Aleesa

出处

期刊：Chaos Solitons & Fractals [Elsevier BV]
日期：2021-10-01 卷期号：151: 111236-111236 被引量：31

标识

DOI：10.1016/j.chaos.2021.111236

摘要

Missing data is a common problem in real-world data sets and it is amongst the most complex topics in computer science and many other research domains. The common ways to cope with missing values are either by elimination or imputation depending of the volume of the missing data and its distribution nature. It becomes imperative to come up with new imputation approaches along with efficient algorithms. Though most existing imputation methods focus on a moderate amount of missing data, imputation for high missing rates over 80% is still important but challenging. Even with the existence of some works in addressing high missing volume issue, they mostly rely on imputing reference dataset (Complete Datasets for evaluation) after they create artificial missing values and impute it to measure the accuracy of their proposed techniques. So far, the option of imputing high proportions of missing values with no reference comparison dataset (Original Dataset with highly missing values) have been often ignored or overlooked. Therefore, we propose a missing data imputation approach for high volumes of missing values with no reference comparison dataset. The approach makes use of pre-processing measures and breaking the dataset into small continuous non-missing portions then using Multi Criteria Decision-making analysis to select a portion of data which is representative of the entire broken datasets. This portion helps to create reference comparisons and expands the missing dataset through artificial missing-making procedures with different percentages and imputation using different machine learning techniques. This study conducted two experiments using BMI datasets with more than 80% of missing values, derived from the National Child Development Centre (NCDRC) at Sultan Idris Education University (UPSI), Malaysia. The results show that our approach capability in reconstructing datasets with huge missing values.

求助该文献

最长约 10秒，即可获得该文献文件

Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation

今日热心研友