前瞻性队列研究
机器学习
计算机科学
队列
人工智能
癌症
算法
医学
内科学
作者
Xifeng Wu,Huakang Tu,Qingfeng Hu,Shan P. Tsai,David Ta‐Wei Chu,Chi Pang Wen
标识
DOI:10.1136/bmjonc-2023-000087
摘要
Objective To develop and validate machine-learning models that predict the risk of pan-cancer incidence using demographic, questionnaire and routine health check-up data in a large Asian population. Methods and analysis This study is a prospective cohort study including 433 549 participants from the prospective MJ cohort including a male cohort (n=208 599) and a female cohort (n=224 950). Results During an 8-year median follow-up, 5143 cancers occurred in males and 4764 in females. Compared with Lasso-Cox and Random Survival Forests, XGBoost showed superior performance for both cohorts. The XGBoost model with all 155 features in males and 160 features in females achieved an area under the curve (AUC) of 0.877 and 0.750, respectively. Light models with 31 variables for males and 11 variables for females showed comparable performance: an AUC of 0.876 (95% CI 0.858 to 0.894) in the overall population and 0.818 (95% CI 0.795 to 0.841) in those aged ≥40 years in the male cohort and an AUC of 0.746 (95% CI 0.721 to 0.771) in the overall population and 0.641 (95% CI 0.605 to 0.677) in those aged ≥40 years in the female cohort. High-risk individuals have at least ninefold higher risk of pan-cancer incidence compared with low-risk groups. Conclusion We developed and internally validated the first machine-learning models based on routine health check-up data to predict pan-cancer risk in the general population and achieved generally good discriminatory ability with a small set of predictors. External validation is warranted before the implementation of our risk model in clinical practice.
科研通智能强力驱动
Strongly Powered by AbleSci AI