逆概率加权
缺少数据
插补(统计学)
加权
反概率
计算机科学
估计员
统计
数据挖掘
稳健性(进化)
计量经济学
数学
后验概率
人工智能
机器学习
医学
贝叶斯概率
基因
放射科
生物化学
化学
作者
Fuyu Guo,Benjamin Langworthy,Shuji Ogino,Molin Wang
标识
DOI:10.1177/09622802231226328
摘要
Identifying and distinguishing risk factors for heterogeneous disease subtypes has been of great interest. However, missingness in disease subtypes is a common problem in those data analyses. Several methods have been proposed to deal with the missing data, including complete-case analysis, inverse-probability weighting, and multiple imputation. Although extant literature has compared these methods in missing problems, none has focused on the competing risk setting. In this paper, we discuss the assumptions required when complete-case analysis, inverse-probability weighting, and multiple imputation are used to deal with the missing failure subtype problem, focusing on how to implement these methods under various realistic scenarios in competing risk settings. Besides, we compare these three methods regarding their biases, efficiency, and robustness to model misspecifications using simulation studies. Our results show that complete-case analysis can be seriously biased when the missing completely at random assumption does not hold. Inverse-probability weighting and multiple imputation estimators are valid when we correctly specify the corresponding models for missingness and for imputation, and multiple imputation typically shows higher efficiency than inverse-probability weighting. However, in real-world studies, building imputation models for the missing subtypes can be more challenging than building missingness models. In that case, inverse-probability weighting could be preferred for its easy usage. We also propose two automated model selection procedures and demonstrate their usage in a study of the association between smoking and colorectal cancer subtypes in the Nurses’ Health Study and Health Professional Follow-Up Study.
科研通智能强力驱动
Strongly Powered by AbleSci AI