作者
Weiqi Liao,Carol Coupland,Judith Burchardt,David Baldwin,Fergus Gleeson,Julia Hippisley‐Cox,Fergus Gleeson,David Baldwin,George Batchkala,James Buchanan,Judith Burchardt,Rohan Chakraborty,Rishi Chana,Yan Chen,Carol Coupland,Charles Crichton,Jim Davies,Anand Devaraj,Mengran Fan,Julia Hippisley‐Cox,Rositsa Koleva‐Kolarova,Richard Lee,Weiqi Liao,Arjun Nair,L. Pickup,Anne Powell,Jens Rittscher,Amied Shadmaan,Kandavel Shanmugam,Elizabeth A Stokes,Clare Verrill,Johnathan Watkins,Sarah Wordsworth
摘要
Background Lung cancer is the second most common cancer in incidence and the leading cause of cancer deaths worldwide.Meanwhile, lung cancer screening with low-dose CT can reduce mortality.The UK National Screening Committee recommended targeted lung cancer screening on Sept 29, 2022, and asked for more modelling work to be done to help refine the recommendation.This study aims to develop and validate a risk prediction model-the CanPredict (lung) model-for lung cancer screening in the UK and compare the model performance against seven other risk prediction models.Methods For this retrospective, population-based, cohort study, we used linked electronic health records from two English primary care databases: QResearch (Jan 1, 2005-March 31, 2020) and Clinical Practice Research Datalink (CPRD) Gold (Jan 1, 2004-Jan 1, 2015).The primary study outcome was an incident diagnosis of lung cancer.We used a Cox proportional-hazards model in the derivation cohort (12•99 million individuals aged 25-84 years from the QResearch database) to develop the CanPredict (lung) model in men and women.We used discrimination measures (Harrell's C statistic, D statistic, and the explained variation in time to diagnosis of lung cancer [R ² D ]) and calibration plots to evaluate model performance by sex and ethnicity, using data from QResearch (4•14 million people for internal validation) and CPRD (2•54 million for external validation).Seven models for predicting lung cancer risk (Liverpool Lung Project [LLP] v2 , LLP v3 , Lung Cancer Risk Assessment Tool [LCRAT], Prostate, Lung, Colorectal, and Ovarian [PLCO] M2012 , PLCO M2014 , Pittsburgh, and Bach) were selected to compare their model performance with the CanPredict (lung) model using two approaches: (1) in ever-smokers aged 55-74 years (the population recommended for lung cancer screening in the UK), and (2) in the populations for each model determined by that model's eligibility criteria.Findings There were 73 380 incident lung cancer cases in the QResearch derivation cohort, 22 838 cases in the QResearch internal validation cohort, and 16 145 cases in the CPRD external validation cohort during follow-up.The predictors in the final model included sociodemographic characteristics (age, sex, ethnicity, Townsend score), lifestyle factors (BMI, smoking and alcohol status), comorbidities, family history of lung cancer, and personal history of other cancers.Some predictors were different between the models for women and men, but model performance was similar between sexes.The CanPredict (lung) model showed excellent discrimination and calibration in both internal and external validation of the full model, by sex and ethnicity.The model explained 65% of the variation in time to diagnosis of lung cancer in both sexes in the QResearch validation cohort and 59% of the R ² D in both sexes in the CPRD validation cohort.Harrell's C statistics were 0•90 in the QResearch (validation) cohort and 0•87 in the CPRD cohort, and the D statistics were 2•8 in the QResearch (validation) cohort and 2•4 in the CPRD cohort.Compared with seven other lung cancer prediction models, the CanPredict (lung) model had the best performance in discrimination, calibration, and net benefit across three prediction horizons (5, 6, and 10 years) in the two approaches.The CanPredict (lung) model also had higher sensitivity than the current UK recommended models (LLP v2 and PLCO M2012 ), as it identified more lung cancer cases than those models by screening the same amount of individuals at high risk.Interpretation The CanPredict (lung) model was developed, and internally and externally validated, using data from 19•67 million people from two English primary care databases.Our model has potential utility for risk stratification of the UK primary care population and selection of individuals at high risk of lung cancer for targeted screening.If our model is recommended to be implemented in primary care, each individual's risk can be calculated using information in the primary care electronic health records, and people at high risk can be identified for the lung cancer screening programme.