作者
Qingpeng Zhang,Ping Liang,Jiannan Yang,Weilan Wang,Guanjie Yuan,Min Han,Zhen Li
摘要
Abstract Purpose: To explore the performance and intelligibility of machine-learning and deep-learning models on end-stage renal disease (ESRD) prediction, based on readily-accessible clinical and laboratory features of patients suffering from chronic kidney disease (CKD). Materials and Methods: This single-center retrospective study included 2,382 patients diagnosed with CKD, of which 1,765 were included in the modelling analysis. Eight models (Logistic Regression (LR); Ridge Regression Classification (RRC); Least Absolute Shrinkage and Selection Operator (LASSO); Support Vector Machine (SVM) with a Gaussian kernel (SVM-RBF); and a linear kernel (SVM-Linear); Random Forest (RF); XGBoost; and Deep Neural Network (DNN)) were used to predict whether one person suffering from CKD would progress to ESRD within three years based on basic demographics, and clinical and comorbidity information. LASSO, RF, and XGBoost were introduced to screen the most significant markers to ESRD from the input features. For the DNN model, we introduced four advanced attribution methods (Integrated Gradients, DeepLIFT, GradientSHAP, and Feature Ablation) to enhance model intelligibility. Results: Age, follow-up duration, and 17 biochemical test outcomes (for instance, serum creatinine and hemoglobin) showed significant differences between patients in four CKD stages. The DNN model achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.8843, which was significantly higher than that of baseline models. Nonlinear machine learning models (SVM-RBF, RF, XGBoost, and DNN) generally outperformed linear ones (LR, RRC, LASSO, and SVM-Linear). The interpretation generated by DNN with attribution methods, RF, and XGBoost were consistent with clinical knowledge, whereas LASSO-based interpretation was inconsistent. Hematuria, proteinuria, potassium, urine albumin to creatinine ratio (ACR) were positively associated with the progression of CKD, while eGFR and urine creatinine were negatively associated with the progression of CKD. Hematuria is the most important independent risk predictor for the progression of diabetic nephropathy and urolithiasis. Conclusion: The adopted DNN with attribution algorithms extracted intelligible features of CKD progression. In addition, the DNN model identified a number of critical, but under-reported features, such as hematuria, that may be novel markers for the progression of CKD. This study provides physicians solid data-driven evidence in using machine learning and deep learning models for CKD clinical management and treatment.