医学
队列
特征选择
肺炎
随机森林
内科学
机器学习
肿瘤科
癌症
人工智能
计算机科学
肺
作者
Levente Lippenszky,Kathleen F. Mittendorf,Zoltán Kiss,Michele L. Lenoue-Newton,Pablo Napan-Molina,Protiva Rahman,Cheng Ye,Balázs Laczi,Eszter Csernai,Neha Jain,Marilyn Holt,C. Noel Maxwell,Madeleine Ball,Yufang Ma,Margaret B. Mitchell,Douglas B. Johnson,David S. Smith,Ben Ho Park,Christine Micheel,Daniel Fabbri,Jan Wolber,Travis Osterman
出处
期刊:JCO clinical cancer informatics
[American Society of Clinical Oncology]
日期:2024-03-01
卷期号: (8)
被引量:8
摘要
PURPOSE Although immune checkpoint inhibitors (ICIs) have improved outcomes in certain patients with cancer, they can also cause life-threatening immunotoxicities. Predicting immunotoxicity risks alongside response could provide a personalized risk-benefit profile, inform therapeutic decision making, and improve clinical trial cohort selection. We aimed to build a machine learning (ML) framework using routine electronic health record (EHR) data to predict hepatitis, colitis, pneumonitis, and 1-year overall survival. METHODS Real-world EHR data of more than 2,200 patients treated with ICI through December 31, 2018, were used to develop predictive models. Using a prediction time point of ICI initiation, a 1-year prediction time window was applied to create binary labels for the four outcomes for each patient. Feature engineering involved aggregating laboratory measurements over appropriate time windows (60-365 days). Patients were randomly partitioned into training (80%) and test (20%) sets. Random forest classifiers were developed using a rigorous model development framework. RESULTS The patient cohort had a median age of 63 years and was 61.8% male. Patients predominantly had melanoma (37.8%), lung cancer (27.3%), or genitourinary cancer (16.4%). They were treated with PD-1 (60.4%), PD-L1 (9.0%), and CTLA-4 (19.7%) ICIs. Our models demonstrate reasonably strong performance, with AUCs of 0.739, 0.729, 0.755, and 0.752 for the pneumonitis, hepatitis, colitis, and 1-year overall survival models, respectively. Each model relies on an outcome-specific feature set, though some features are shared among models. CONCLUSION To our knowledge, this is the first ML solution that assesses individual ICI risk-benefit profiles based predominantly on routine structured EHR data. As such, use of our ML solution will not require additional data collection or documentation in the clinic.