插补(统计学)
缺少数据
计算机科学
人工智能
机器学习
作者
Priyanka Bathla,Rajneesh Kumar
标识
DOI:10.1109/icccis60361.2023.10425097
摘要
Missing values in the medical dataset have a significant impact on accuracy of machine learning models. Imputation of missing data is a crucial task, when it is crucial to use each available data and keep records with missing values. The goal of this paper is to identify an effective imputation technique. This work evaluates performance of several imputation methods to handle missing values. For the study, authors have employed Stroke Prediction dataset from Kaggle. Missing values imputation has been performed using four imputation techniques namely Multivariate Imputation by Chained Equations (MICE), K Nearest Neighbour (KNN), MEAN, and MODE. The performance of imputation techniques was analyzed with RF (Random Forest) classifier to find out the best imputation method. The findings reveal MEAN imputation technique provides the most promising results among the used imputation methods. The authors of this study have provided a methodology using imputation methods for brain stroke prediction and have thoroughly analyzed the various steps in the process.
科研通智能强力驱动
Strongly Powered by AbleSci AI