Radiomics and machine learning model can improve the differentiation between ocular adnexal lymphoma and idiopathic orbital inflammation

无线电技术淋巴瘤炎症医学病理计算机科学放射科免疫学

作者

Guorong Wang,Xiaoxia Qu,Jian Guo,Yongheng Luo,Junfang Xian

出处

期刊：Chinese Medical Journal [Ovid Technologies (Wolters Kluwer)]
日期：2024-10-30

链接

doi.org lww.com nih.govdoi.org

标识

DOI：10.1097/cm9.0000000000003356

摘要

To the Editor: Distinguishing ocular adnexal lymphoma (OAL) from idiopathic orbital inflammation (IOI) is challenging owing to their similar clinical symptoms and imaging features. Previous research has demonstrated that magnetic resonance imaging (MRI)-based radiological characteristics can offer valuable insights for distinguishing between OAL and IOI. However, the diagnostic accuracy of these imaging findings relies largely on subjective interpretation, leading to inconsistent and sometimes controversial conclusions. The integration of MRI-based radiomics with machine learning (ML) is expected to provide quantitative features in a more objective manner, thereby further establishing diagnostic models and enhancing diagnostic accuracy. OAL accounts for 10–50% of orbital malignancies in adults, with low-dose radiotherapy as the recommended initial treatment.[1,2] IOI is an inflammatory process in the orbit with uncertain causes that responds well to oral corticosteroids. Clinically, differentiating OAL from IOI is essential owing to their similar symptoms and imaging characteristics. Biopsy represents the gold standard, yet it is invasive and risky. MRI offers a non-invasive alternative, and recent studies highlight its potential in distinguishing OAL from IOI using radiomics.[3] However, these studies were single-center with single algorithms. This study aimed to develop multiparametric MRI radiomics models using T1- and T2-weighted imaging (T1WI, T2WI) and T1-weighted contrast-enhanced (T1CE) images combined with various ML algorithms to distinguish between these two entities. We also sought to identify the optimal model and test its clinical applicability with an external test set. This retrospective study was approved by the Beijing Tongren Hospital's Institutional Review Board (No. TREC2023-KY107) and registered on ClinicalTrials.gov (NCT06336499). The requirement for informed consent was waived due to its retrospective nature. We collected patients diagnosed with OAL and IOI between January 2015 and March 2022 at Beijing Tongren Hospital. Inclusion criteria: (1) patients pathologically confirmed OAL and IOI; (2) those with complete preoperative MRI data (T1WI, T2WI, and T1CE); and (3) those with clear MRI lesions. Exclusion criteria: (1) patients with severe artifacts; and (2) those with lesions smaller than 1 cm. A total of 132 OAL and 106 IOI patients from Beijing Tongren Hospital were enrolled and randomly divided into training and internal test sets (7:3 ratio). Additionally, 31 OAL and 14 IOI patients from the Second Xiangya Hospital of Central South University during the same period were included in the external test set. The details of MRI acquisition were shown in Supplementary Table 1, https://links.lww.com/CM9/C203. The regions of interest (ROIs) for OAL and IOI were manually delineated on T1WI, T2WI, and T1CE images using ITK-SNAP (version 4.0.0, developed by Penn Image Computing and Science Laboratory at the University of Pennsylvania, Philadelphia, USA, http://www.itksnap.org/) by a radiologist with 3 years of experience (Radiologist 1). These segmentations were then reviewed and adjusted by a senior radiologist with 10 years of experience (Radiologist 2). To assess intra-observer consistency, Radiologist 1 re-segmented images from 30 randomly selected patients. Visual assessments were independently conducted by two radiologists, blinded to the pathological findings. All MRI images underwent gray-level normalization (ranging from 0 to 1024) before feature extraction. Radiomics features were extracted using the FeAture Explorer software (FAE; version 0.5.8, developed by East China Normal University and Siemens Healthineers Ltd., Shanghai, China) configured with Pyradiomics (https://github.com/salan668/FAE) in this study. Overall, a total of 1688 features were extracted from each original MRI sequence image [Supplementary Table 2, https://links.lww.com/CM9/C203]. To balance the OAL and IOI sample numbers, we used the Synthetic Minority Oversampling Technique (SMOTE) to preprocess features from each MRI sequence. We investigated the best ML models for classifying OAL and IOI using multiple normalization methods, feature dimension reduction and selection approaches, and classification methods. Features were normalized using Z-score, Min-Max, and Mean. We reduced feature dimensions with Pearson correlation coefficient (PCC) and principal component analysis (PCA), removing features with PCC >0.99. Feature selection utilized methods including analysis of variance (ANOVA), Relief, recursive feature elimination (RFE), and Kruskal–Wallis (KW), selecting the number of 1–10 features from each technique. Ten ML algorithms were used for classification: logistic regression (LR), support vector machine (SVM), random forests (RF), logistic regression via Lasso (LRLasso), linear discriminant analysis (LDA), AdaBoost (AB), autoencoder (AE), naive Bayes (NB), Gaussian process (GP), and decision tree (DT). This resulted in 2400 pipelines, which were calculated as follows: 3 (normalization methods) × 2 (dimension reduction methods) × 4 (feature selection methods) × 10 (feature numbers) × 10 (classification methods) = 2400. Radiomics features from each MRI sequence were used to build models to identify OAL from IOI. We then combined T1WI, T2WI, and T1CE images to train another model for optimal determination. The workflow framework is illustrated in Figure 1.Figure 1: The schematic diagram for the multiparametric MRI-based machine learning model construction for differential diagnosis between OAL and IOI. AB: AdaBoost; AE: Autoencoder; ANOVA: Analysis of variance; AUC: Areas under the receiver operator characteristic curve; DT: Decision tree; GLCM: Gray level co-occurrence matrix; GLDM: Gray level dependence matrix; GLRLM: Gray level run length matrix; GLSZM: Gray level size zone matrix; GP: Gaussian process; ICC: Interclass correlation coefficient; IOI: Idiopathic orbital inflammation; KW: Kruskal–Wallis; LBP: Local binary pattern; LDA: Linear discriminant analysis; LR: Logistic regression; LRLasso: Logistic regression via Lasso; MRI: Magnetic resonance imaging; NB: Naive Bayes; NGTDM: Neighboring gray tone difference matrix; OAL: Ocular adnexal lymphoma; PCA: Principal component analysis; PCC: Pearson correlation coefficient; RF: Random forests; RFE: Recursive feature elimination; ROIs: Regions of interest; SVM: support vector machine; T1CE: T1-weighted contrast-enhanced; T1WI: T1-weighted imaging; T2WI: T2-weighted imaging.The t-test and chi-squared test were used for comparing continuous and categorical variables, respectively. Intra-observer consistency was evaluated using the interclass correlation coefficient (ICC). The chi-squared test compared diagnostic performance between visual assessment and ML models. Five-fold cross-validation was applied to the training set. Model performance was assessed using receiver operating characteristic (ROC) curve analysis, quantified by the area under the ROC curve (AUC). The DeLong test compared ROC curves across models. Accuracy, sensitivity, specificity, positive prediction value (PPV), and negative prediction value (NPV) were calculated at the Youden index cutoff. The 95% confidence interval (CI) was estimated via bootstrapping with 1000 replicates. Calibration was measured by the Brier score with a scale of 0–1. Analyses were conducted using FAE in Python (version 3.7.6, Python Software Foundation, 9450 SW Gemini Dr., ECM# 90772, Beaverton, OR 97008, USA) and Statistical Product and Service Solutions (SPSS, version 20.0, SPSS Inc., Chicago, USA). A P-value less than 0.05 was considered statistically significant. OAL was more common in older male patients in both the training and internal test sets (all P <0.05). In the external test set, OAL was also more common in older patients (P = 0.020), but gender distribution was not significantly different (P = 0.072). There were no significant differences in lesion side distribution across all sets (all P >0.05) [Supplementary Table 3, https://links.lww.com/CM9/C203]. The ICC values ranged from 0.815 to 0.915 (P <0.001), indicating satisfactory repeatability of feature extraction. Combining T1WI, T2WI, and T1CE to develop the differential diagnosis model, the pipeline with Mean normalization, PCA, ANOVA, and LR achieved the highest AUC. Ten features contributed to this model using the "one-standard error" rule. The AUCs were 0.921 (95% CI: 0.876–0.966), 0.900 (95% CI: 0.851–0.948), 0.849 (95% CI: 0.759–0.940), and 0.786 (95% CI: 0.653–0.918) in the training, validation, internal, and external test sets, respectively [Supplementary Table 4, Supplementary Figure 1, https://links.lww.com/CM9/C203]. These AUCs were superior to those of separate MRI sequences. The AUC values among the four models in the internal and external test sets showed no statistical significance (Delong test, all P > 0.05). However, the model combining T1WI, T2WI, and T1CE had the lowest Brier scores of 0.155 (internal test set) and 0.190 (external test set), indicating good calibration. The ML model based on multi-sequence MRI outperformed a junior radiologist and matched the performance of a senior radiologist [Supplementary Table 5, https://links.lww.com/CM9/C203]. Several prior studies have illustrated that MRI radiomics may possess the capability to differentiate OAL from IOI [Supplementary Table 6, https://links.lww.com/CM9/C203]. They were carried out at a single institution, with a relatively small sample size and a single algorithm. Therefore, the diagnostic performance needs to be further improved. The research differed from previous studies by employing a range of methods and algorithms to create 2400 processing pipelines for multiparametric MRI data. The present study found that the pipeline of optimal model configurated with Mean, PCA, ANOVA, and LR based on the combination of T1WI, T2WI, and T1CE images achieved the highest AUC of 0.849 and 0.786 in the internal and external test cohort, respectively, surpassing the previous findings. We have assessed our study using the Radiomics Quality Score (RQS),[4] achieving a score of 15. This is higher than the average RQS of 11.17 reported in a recent systematic review of ophthalmic radiomics studies.[5] The review highlighted limitations such as small sample sizes (median of 110 participants) and few studies with prospective designs or multicenter validation. Our study addresses these by including a relatively larger cohort (133 OALs and 106 IOIs) and an external validation set (31 OALs and 14 IOIs). The review also noted a lack of open data or code in many studies. In contrast, our study utilized the open-source tool FAE for radiomics analysis, making it more accessible for researchers pursuing a similar work. We acknowledge that there is a risk of overfitting due to the large number of features and relatively small sample sizes. To address this issue, we used feature selection methods such as ANOVA and feature dimension reduction methods such as PCC and PCA. In addition, we applied five-fold cross validation to ensure the robustness of model evaluation. Finally, the optimal model achieved satisfactory AUC values of 0.849 and 0.786 in the internal and external test sets, respectively, indicating promising diagnostic performance. The superior performance of the radiomics model combining T1WI, T2WI, and T1CE can be attributed to several factors: (1) The integration of multimodal MRI sequences captures a comprehensive set of imaging features, enhancing the model's ability to distinguish between OAL and IOI. (2) Advanced techniques like PCA for dimension reduction, ANOVA for feature selection, and LR for classification ensure that the most relevant and discriminative features are utilized. (3) High AUCs of 0.921, 0.900, 0.849, and 0.786 in the training, validation, internal, and external test sets indicate the model's robustness and generalizability across different datasets. Clinically, this model advances non-invasive differentiation of OAL from IOI, potentially reducing the need for biopsies and improving treatment decisions. This study has several limitations. First, due to the various histologic subtypes of OAL, future studies with larger sample sizes should conduct detailed subgroup evaluations. Second, the model relied on manual segmentation of orbital lesions, which is labor-intensive and time-consuming. Automated segmentation should be considered in future work. Third, although diffusion-weighted imaging (DWI), apparent diffusion coefficient (ADC), and dynamic contrast-enhanced (DCE) MRI can provide valuable insights for distinguishing between OAL and IOI, they were not included in this study. Finally, the retrospective design may introduce selection bias. In conclusion, this study developed an ML model using radiomics from T1WI, T2WI, and T1CE MRI data to distinguish OAL from IOI. The optimal pipeline included Mean normalization, PCA for dimension reduction, ANOVA for feature selection, and LR for classification. This method shows great promise as a valuable tool for differential diagnosis between OAL and IOI, especially for radiology residents with limited head and neck imaging experience. Funding The study was supported by National Health Commission's Capacity Building and Continuing Education Center (No. YXFSC2022JJSJ009); Beijing Municipal Administration of Hospitals' Ascent Plan (No. DFL20190203); Beijing Postdoctoral Research Foundation (No. 2023-ZZ-027); National Key R&D Program of China (No. 2022YFC2404005). Conflicts of interest None.

求助该文献

Radiomics and machine learning model can improve the differentiation between ocular adnexal lymphoma and idiopathic orbital inflammation

今日热心研友