作者
Jeong Hoon Lee,Eun Ju Ha,Dayoung Kim,Yong Jun Jung,Subin Heo,Yong-ho Jang,Sung Hyun An,Kyung‐Min Lee
摘要
This study aimed to validate a deep learning model’s diagnostic performance in using computed tomography (CT) to diagnose cervical lymph node metastasis (LNM) from thyroid cancer in a large clinical cohort and to evaluate the model’s clinical utility for resident training. The performance of eight deep learning models was validated using 3838 axial CT images from 698 consecutive patients with thyroid cancer who underwent preoperative CT imaging between January and August 2018 (3606 and 232 images from benign and malignant lymph nodes, respectively). Six trainees viewed the same patient images (n = 242), and their diagnostic performance and confidence level (5-point scale) were assessed before and after computer-aided diagnosis (CAD) was included. The overall area under the receiver operating characteristics (AUROC) of the eight deep learning algorithms was 0.846 (range 0.784–0.884). The best performing model was Xception, with an AUROC of 0.884. The diagnostic accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of Xception were 82.8%, 80.2%, 83.0%, 83.0%, and 80.2%, respectively. After introducing the CAD system, underperforming trainees received more help from artificial intelligence than the higher performing trainees (p = 0.046), and overall confidence levels significantly increased from 3.90 to 4.30 (p < 0.001). The deep learning–based CAD system used in this study for CT diagnosis of cervical LNM from thyroid cancer was clinically validated with an AUROC of 0.884. This approach may serve as a training tool to help resident physicians to gain confidence in diagnosis. • A deep learning-based CAD system for CT diagnosis of cervical LNM from thyroid cancer was validated using data from a clinical cohort. The AUROC for the eight tested algorithms ranged from 0.784 to 0.884.
• Of the eight models, the Xception algorithm was the best performing model for the external validation dataset with 0.884 AUROC. The accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were 82.8%, 80.2%, 83.0%, 83.0%, and 80.2%, respectively.
• The CAD system exhibited potential to improve diagnostic specificity and accuracy in underperforming trainees (3 of 6 trainees, 50.0%). This approach may have clinical utility as a training tool to help trainees to gain confidence in diagnoses.