缺少数据
插补(统计学)
统计
逻辑回归
蒙特卡罗方法
无效假设
回归分析
数学
计量经济学
标识
DOI:10.1016/j.jclinepi.2004.11.029
摘要
The purpose of this study is to determine the effect of three common approaches to handling missing data on the results of a predictive model.Monte Carlo simulation study using simulated data was used. A baseline logistic regression using complete data was performed to predict hospital admission, based on the white blood cell count (WBC) (dichotomized as normal or high), presence of fever, or procedures performed (PROC). A series of simulations was then performed in which WBC data were deleted for varying proportions (15-85%) of patients under various patterns of missingness. Three analytic approaches were used: analysis restricted to cases with complete data, missing data assumed to be normal (MAN), and use of imputed values.In the baseline analysis, all three predictors were all significantly associated with admission. Using either the MAN approach or imputation, the odds ratio (OR) for WBC was substantially over- or underestimated depending on the missingness pattern, and there was considerable bias toward the null in the OR estimates for fever. In the CC analyses, OR for WBC was consistently biased toward the null, OR for PROC was biased away from the null, and the OR for fever was biased toward or away from the null. Estimates for overall model discrimination were substantially biased using all analytic approaches.All three methods of handling large amounts of missing data can lead to biased estimates of the OR and of model performance in predictive models. Predictor variables that are measured inconsistently can affect the validity of such models.
科研通智能强力驱动
Strongly Powered by AbleSci AI