摘要
Propensity score weighting methods are often used in non-randomized studies to adjust for confounding and assess treatment effects. The most popular among them, the inverse probability weighting, assigns weights that are proportional to the inverse of the conditional probability of a specific treatment assignment, given observed covariates. A key requirement for inverse probability weighting estimation is the positivity assumption, i.e. the propensity score must be bounded away from 0 and 1. In practice, violations of the positivity assumption often manifest by the presence of limited overlap in the propensity score distributions between treatment groups. When these practical violations occur, a small number of highly influential inverse probability weights may lead to unstable inverse probability weighting estimators, with biased estimates and large variances. To mitigate these issues, a number of alternative methods have been proposed, including inverse probability weighting trimming, overlap weights, matching weights, and entropy weights. Because overlap weights, matching weights, and entropy weights target the population for whom there is equipoise (and with adequate overlap) and their estimands depend on the true propensity score, a common criticism is that these estimators may be more sensitive to misspecifications of the propensity score model. In this paper, we conduct extensive simulation studies to compare the performances of inverse probability weighting and inverse probability weighting trimming against those of overlap weights, matching weights, and entropy weights under limited overlap and misspecified propensity score models. Across the wide range of scenarios we considered, overlap weights, matching weights, and entropy weights consistently outperform inverse probability weighting in terms of bias, root mean squared error, and coverage probability.