Deep Learning to Differentiate Benign and Malignant Vertebral Fractures at Multidetector CT
医学
接收机工作特性
放射科
内科学
作者
Sarah C. Foreman,David Schinz,Malek El Husseini,Sophia S. Goller,Jürgen Weißinger,Anna-Sophia Dietrich,Martin Renz,Marie‐Christin Metz,Georg C. Feuerriegel,Benedikt Wiestler,Robert Stahl,Benedikt J. Schwaiger,Marcus R. Makowski,Jan S. Kirschke,Alexandra S. Gersing
出处
期刊:Radiology [Radiological Society of North America] 日期:2024-03-01卷期号:310 (3)被引量:4
Background Differentiating between benign and malignant vertebral fractures poses diagnostic challenges. Purpose To investigate the reliability of CT-based deep learning models to differentiate between benign and malignant vertebral fractures. Materials and Methods CT scans acquired in patients with benign or malignant vertebral fractures from June 2005 to December 2022 at two university hospitals were retrospectively identified based on a composite reference standard that included histopathologic and radiologic information. An internal test set was randomly selected, and an external test set was obtained from an additional hospital. Models used a three-dimensional U-Net encoder-classifier architecture and applied data augmentation during training. Performance was evaluated using the area under the receiver operating characteristic curve (AUC) and compared with that of two residents and one fellowship-trained radiologist using the DeLong test. Results The training set included 381 patients (mean age, 69.9 years ± 11.4 [SD]; 193 male) with 1307 vertebrae (378 benign fractures, 447 malignant fractures, 482 malignant lesions). Internal and external test sets included 86 (mean age, 66.9 years ± 12; 45 male) and 65 (mean age, 68.8 years ± 12.5; 39 female) patients, respectively. The better-performing model of two training approaches achieved AUCs of 0.85 (95% CI: 0.77, 0.92) in the internal and 0.75 (95% CI: 0.64, 0.85) in the external test sets. Including an uncertainty category further improved performance to AUCs of 0.91 (95% CI: 0.83, 0.97) in the internal test set and 0.76 (95% CI: 0.64, 0.88) in the external test set. The AUC values of residents were lower than that of the best-performing model in the internal test set (AUC, 0.69 [95% CI: 0.59, 0.78] and 0.71 [95% CI: 0.61, 0.80]) and external test set (AUC, 0.70 [95% CI: 0.58, 0.80] and 0.71 [95% CI: 0.60, 0.82]), with significant differences only for the internal test set (