作者
Xiaosheng Li,Zailin Yang,Jieping Li,Guixue Wang,Anlong Sun,Ying Wang,Wei Zhang,Yao Liu,Haike Lei
摘要
Background and objectiveThe pathological staging of non-Hodgkin lymphoma (NHL) is complex, the clinical manifestations are varied, and the prognosis differ considerably. To provide a useful reference for early detection and effective treatment of NHL, we developed a random survival forest (RSF) prognostic model based on machine learning (ML) algorithms using prospective cohort data collected from Chongqing Cancer Hospital from Jan 1, 2017 to Dec 31, 2019 (n = 1449) to compare with the traditional cornerstone method Cox proportional hazards (CPH) model and evaluate the predictability of the model.MethodsPatients were randomly split into a training cohort (TC) and validation cohort (VC) based on 65/35 ratio. The least absolute shrinkage and selection operator (LASSO) regression analysis was used to extracted the important features. And the RSF was modeled to explore the prognostic factors impacting the overall survival (OS) of patients with NHLs in the TC and validated in the VC. The C-index, the Integrated Brier Score (IBS), Kaplan-Meir method, the receiver operating characteristic (ROC) curve, and the area under the ROC curve (AUC) were selected to measure performances and discriminations of the models. In addition, individual survival probability predicted for NHL patients.ResultsAccording to the features extracted by LASSO model and univariable Cox model, 16 variables were selected to develop the RSF model with log-rank splitting rule, which were age, ethnicity, medical insurance, Ann Arbor stage, pathology, targeted-therapy, chemo-therapy, peripheral blood neutrophil count to lymphocyte count ratio (NLR), peripheral blood platelet count to lymphocyte count ratio (PLR), serum lactate dehydrogenase (LDH), CD4/CD8, platelet (PLT), absolute neutrophil count (ANC), lymphocyte (LYM), B-symptoms, and (CPR) were important prognostic factors. Compared to the CPH model (C-index = 0.748, IBS = 0.166), the RSF model (C-index = 0.786, IBS = 0.165) is outperformed in predictability and accuracy. The AUC of the RSF model to estimate the 1-, 3-, and 5-year OS in TC were 0.847, 0.847, and 0.809, respectively; while those in the CPH were 0.816, 0.803, and 0.750, respectively.ConclusionsTo provide practical implications for the implementation of individualized therapy, the study constructed a high-performed RSF model and reveal that it outperformed the traditional model CPH. And the RSF model ranked the risk variables. In addition, we stratified the risk of NHL patients and estimated individual survival probability based on the RSF model.