作者
Shirley Wang,Sebastian Schneeweiß,Jessica M. Franklin,Rishi Desai,William B. Feldman,Elizabeth M. Garry,Robert J. Glynn,Kueiyu Joshua Lin,Julie M. Paik,Elisabetta Patorno,Samy Suissa,Elvira D’Andrea,Dureshahwar Jawaid,Hemin Lee,Ajinkya Pawar,Sushama Kattinakere Sreedhara,Helen Tesfaye,Lily G. Bessette,Luke E. Zabotka,Su Been Lee,Nileesa Gautam,Cassie York,Heidi Zakoul,John Concato,David Martin,Dianne Paraoan,Kenneth Quinto
摘要
Importance Nonrandomized studies using insurance claims databases can be analyzed to produce real-world evidence on the effectiveness of medical products. Given the lack of baseline randomization and measurement issues, concerns exist about whether such studies produce unbiased treatment effect estimates. Objective To emulate the design of 30 completed and 2 ongoing randomized clinical trials (RCTs) of medications with database studies using observational analogues of the RCT design parameters (population, intervention, comparator, outcome, time [PICOT]) and to quantify agreement in RCT-database study pairs. Design, Setting, and Participants New-user cohort studies with propensity score matching using 3 US claims databases (Optum Clinformatics, MarketScan, and Medicare). Inclusion-exclusion criteria for each database study were prespecified to emulate the corresponding RCT. RCTs were explicitly selected based on feasibility, including power, key confounders, and end points more likely to be emulated with real-world data. All 32 protocols were registered on ClinicalTrials.gov before conducting analyses. Emulations were conducted from 2017 through 2022. Exposures Therapies for multiple clinical conditions were included. Main Outcomes and Measures Database study emulations focused on the primary outcome of the corresponding RCT. Findings of database studies were compared with RCTs using predefined metrics, including Pearson correlation coefficients and binary metrics based on statistical significance agreement, estimate agreement, and standardized difference. Results In these highly selected RCTs, the overall observed agreement between the RCT and the database emulation results was a Pearson correlation of 0.82 (95% CI, 0.64-0.91), with 72% meeting statistical significance, 66% estimate agreement, and 75% standardized difference agreement. In a post hoc analysis limited to 16 RCTs with closer emulation of trial design and measurements, concordance was higher (Pearson r , 0.93; 95% CI, 0.79-0.97; 94% meeting statistical significance, 88% estimate agreement, 88% standardized difference agreement). Weaker concordance occurred among 16 RCTs for which close emulation of certain design elements that define the research question (PICOT) with data from insurance claims was not possible (Pearson r , 0.53; 95% CI, 0.00-0.83; 50% meeting statistical significance, 50% estimate agreement, 69% standardized difference agreement). Conclusions and Relevance Real-world evidence studies can reach similar conclusions as RCTs when design and measurements can be closely emulated, but this may be difficult to achieve. Concordance in results varied depending on the agreement metric. Emulation differences, chance, and residual confounding can contribute to divergence in results and are difficult to disentangle.