Using machine‐learning algorithms to improve imputation in the medical expenditure panel survey

插补（统计学）梯度升压计算机科学随机森林医疗开支小组调查支持向量机回归机器学习线性回归数据挖掘统计人工智能计量经济学算法缺少数据数学医疗保健经济经济增长健康保险

作者

Chandler McClellan,Emily M. Mitchell,Jerrod Anderson,Samuel H. Zuvekas

出处

期刊：Health Services Research [Wiley]
日期：2022-12-25 卷期号：58 (2): 423-432 被引量：3

链接

nih.govdoi.org

标识

DOI：10.1111/1475-6773.14115

摘要

Abstract Objective To assess the feasibility of applying machine learning (ML) methods to imputation in the Medical Expenditure Panel Survey (MEPS). Data Sources All data come from the 2016–2017 MEPS. Study Design Currently, expenditures for medical encounters in the MEPS are imputed with a predictive mean matching (PMM) algorithm in which a linear regression model is used to predict expenditures for events with (donors) and without (recipients) data. Recipient events and donor events are then matched based on the smallest distance between predicted expenditures, and the donor event's expenditures are used as the recipient event's imputation. We replace linear regression algorithm in the PMM framework with ML methods to predict expenditures. We examine five alternatives to linear regression: Gradient Boosting, Random Forests, Extreme Random Forests, Deep Neural Networks, and a Stacked Ensemble approach. Additionally, we introduce an alternative matching scheme, which matches on a vector of predicted expenditures by sources of payment instead of a single total expenditure prediction to generate potentially superior matches. Data Collection Study data is derived from a large federal survey. Principal Findings ML algorithms perform better at both prediction and matching imputation than Ordinary Least Squares (OLS), the most common prediction algorithm used in PMM. On average, the Stacked Ensemble approach that combines all the ML algorithms performs best, improving expenditure prediction R 2 by 108% (0.156 points) and final imputation R 2 by 227% (0.397 points). Matching on a prediction vector also improves alignment of sources of payments between donor and recipient events. Conclusions ML algorithms and an alternative matching scheme improve the overall quality of expenditure PMM imputation in the MEPS. These methods may have additional value in other national surveys that currently rely on PMM or similar methods for imputation.

求助该文献

最长约 10秒，即可获得该文献文件

Using machine‐learning algorithms to improve imputation in the medical expenditure panel survey

今日热心研友