Comparative analysis of machine learning models for predicting pathological complete response to neoadjuvant chemotherapy in breast cancer: An MRI radiomics approach
The aim of this work is to compare different machine learning models for predicting pathological complete response (pCR) to neoadjuvant chemotherapy (NAC) in breast cancer using radiomics features from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). The study included 55 patients with breast cancer, among whom 18 achieved pCR and 37 did not respond completely to NAC (non-pCR). After some pre-processing steps, 1446 features were extracted and corrected for batch effects using ComBat. Five machine learning algorithms, namely random forest (RF), decision tree (DT), logistic regression (LR), k-nearest neighbors (k-NN), and extreme gradient boosting (XGB), were evaluated using area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1 score as classification metrics. A Leave-Group-Out cross validation (LGOCV) was applied in the outer loop. RF and DT models exhibited the highest performances compared to the other algorithms. DT achieved an accuracy of 0.96 ± 0.07, and RF achieved 0.95 ± 0.05. The AUC values for RF and DT were 0.98 ± 0.06 and 0.94 ± 0.07, respectively. LR and k-NN demonstrated lower performance across all metrics, while XGB showed competitive results but slightly lower than RF and DT. This study demonstrates the potential of radiomics and machine learning for predicting pCR to NAC in breast cancer. RF and DT models proved to be the most effective in capturing underlying patterns in radiomics data. Further research is required to validate and strengthen the proposed approach and explore its applicability in diverse radiomics datasets and clinical scenarios.