作者
Sang Ho Lee,H. Geng,Jacinta Arnold,Richard A. Caruana,Yong Fan,Mark Rosen,Aditya Apte,Joseph O. Deasy,Jeffrey D. Bradley,Ying Xiao
摘要
Purpose Our objective was to use interpretable machine learning for choosing dose-volume constraints on cardiopulmonary substructures (CPSs) associated with overall survival (OS) in radiation therapy for locally advanced non-small cell lung cancer. Methods and Materials A total of 428 patients with non-small cell lung cancer were randomly divided into training/validation/test subsets (n = 230/149/49) in Radiation Therapy Oncology Group 0617. Manual or automated contouring was performed to segment CPSs, including heart, atria, ventricles, aorta, left/right ventricle/atrium (LV+RV+LA+RA), inferior/superior vena cava, pulmonary artery, and pericardium. Peri (pericardium-heart), rest (heart-[LV+RV+LA+RA]), clinical target volume (CTV), and lungs-CTV contours were also obtained. Dose-volume histogram features were extracted, including minimum/mean dose to the hottest x% volume (Dx%[Gy]/MOHx%[Gy]), minimum/mean/maximum dose, percent volume receiving at least xGy (VxGy[%]), and overlapping volume of each CPS with planning target volume (PTV_Voverlap[%]). Clinical parameters were collected from the National Clinical Trials Network/Community oncology research program data archive. Feature selection was performed using a series of multiblock sparse partial least squares regression, stability selection supervised principal component analysis, and Boruta. Explainable boosting machine (EBM) was trained using a conditional survival distribution-based approach for imputing censored data, treating survival analysis as a regression problem. Harrell's C-index was used to evaluate OS discrimination performance of EBM, Cox proportional hazards (CPH), random survival forest, extreme gradient boosting survival embeddings, and CPH deep neural network (DeepSurv) models in the test set. Dose-volume constraints were selected using the binary change point detection algorithm in Shapley additive explanations–based partial dependence functions. Results Selected features included LA_V60Gy(%), pericardium_D30%(Gy), lungs-CTV_PTV_Voverlap(%), RA_V55Gy(%), and received_cons_chemo. All models ranked LA_V60Gy(%) as the most important feature. EBM achieved the best performance for predicting OS, followed by extreme gradient boosting survival embeddings, random survival forest, DeepSurv, and CPH (C-index = 0.653, 0.646, 0.642, 0.638, and 0.632). EBM global explanations suggested that LA_V60Gy(%) < 25.6, lungs-CTV_PTV_Voverlap(%) < 1.1, pericardium_D30%(Gy) < 18.9, RA_V55Gy(%) < 19.5, and received_cons_chemo = ‘Yes’ for improved OS. Conclusions EBM can be used to discriminate OS while also guiding dose-volume constraint selection for optimal management of cardiac toxicity in lung cancer radiation therapy. Our objective was to use interpretable machine learning for choosing dose-volume constraints on cardiopulmonary substructures (CPSs) associated with overall survival (OS) in radiation therapy for locally advanced non-small cell lung cancer. A total of 428 patients with non-small cell lung cancer were randomly divided into training/validation/test subsets (n = 230/149/49) in Radiation Therapy Oncology Group 0617. Manual or automated contouring was performed to segment CPSs, including heart, atria, ventricles, aorta, left/right ventricle/atrium (LV+RV+LA+RA), inferior/superior vena cava, pulmonary artery, and pericardium. Peri (pericardium-heart), rest (heart-[LV+RV+LA+RA]), clinical target volume (CTV), and lungs-CTV contours were also obtained. Dose-volume histogram features were extracted, including minimum/mean dose to the hottest x% volume (Dx%[Gy]/MOHx%[Gy]), minimum/mean/maximum dose, percent volume receiving at least xGy (VxGy[%]), and overlapping volume of each CPS with planning target volume (PTV_Voverlap[%]). Clinical parameters were collected from the National Clinical Trials Network/Community oncology research program data archive. Feature selection was performed using a series of multiblock sparse partial least squares regression, stability selection supervised principal component analysis, and Boruta. Explainable boosting machine (EBM) was trained using a conditional survival distribution-based approach for imputing censored data, treating survival analysis as a regression problem. Harrell's C-index was used to evaluate OS discrimination performance of EBM, Cox proportional hazards (CPH), random survival forest, extreme gradient boosting survival embeddings, and CPH deep neural network (DeepSurv) models in the test set. Dose-volume constraints were selected using the binary change point detection algorithm in Shapley additive explanations–based partial dependence functions. Selected features included LA_V60Gy(%), pericardium_D30%(Gy), lungs-CTV_PTV_Voverlap(%), RA_V55Gy(%), and received_cons_chemo. All models ranked LA_V60Gy(%) as the most important feature. EBM achieved the best performance for predicting OS, followed by extreme gradient boosting survival embeddings, random survival forest, DeepSurv, and CPH (C-index = 0.653, 0.646, 0.642, 0.638, and 0.632). EBM global explanations suggested that LA_V60Gy(%) < 25.6, lungs-CTV_PTV_Voverlap(%) < 1.1, pericardium_D30%(Gy) < 18.9, RA_V55Gy(%) < 19.5, and received_cons_chemo = ‘Yes’ for improved OS. EBM can be used to discriminate OS while also guiding dose-volume constraint selection for optimal management of cardiac toxicity in lung cancer radiation therapy.