缺少数据
插补(统计学)
神经影像学
痴呆
随机森林
计算机科学
人工智能
阿尔茨海默病神经影像学倡议
机器学习
认知障碍
认知
算法
心理学
医学
疾病
精神科
内科学
作者
Federica Aracri,Maria Giovanna Bianco,Andrea Quattrone,Alessia Sarica
标识
DOI:10.1109/cbms58004.2023.00300
摘要
Missing value issue is often encountered in international Neuroscience and Neuroimaging databases. As many statistical methods and Machine Learning (ML) algorithms are not designed to work with missing data, usually all variables associated with these records are removed, losing information and negatively affecting performance of neurodegenerative diseases classification such as Dementia. A reliable alternative is to employ imputation to substitute missing values, for example with the mean (I mean ), which is widely applied. Recently, missForest (MF), a Random Forest based algorithm - became popular for handling missing data in biomedical research. Thus, we aimed at assessing the reliability of MF in solving the missingness problem in a cohort of Mild Cognitive Impairment (MCI) and Alzheimer's disease (AD) patients from international database Alzheimer's Disease Neuroimaging Initiative (ADNI), with clinical, cognitive and neuroimaging features. First, we amputed the complete dataset with increasing percentage of missing data (from 10% to 80 % ) by applying Missing Completely At Random (MCAR). Then, we used I mean and MF approaches on amputed datasets and we compared their imputation error (RSME, NRSME, MAE). When average error on all features was considered, MF showed better performance than I mean in each amputation percentage. However, when comparing error on single features, MF had slight performance decrease compared with I mean on cognitive features ADAS, RAVLT and MMSE, regardless of the amputation percentage. We conclude that missForest resulted to be a reliable imputation algorithm for handling missing neuroscience data, although it should be used with caution on highly skewed variables, such as cognitive scores.
科研通智能强力驱动
Strongly Powered by AbleSci AI