Generalizability and Diagnostic Performance of AI Models for Thyroid US

医学概化理论接收机工作特性甲状腺结节分割 Sørensen–骰子系数科恩卡帕人工智能掷骰子回顾性队列研究机器学习放射科甲状腺统计外科图像分割计算机科学内科学数学

作者

Wenwen Xu,Xiaohong Jia,Zihan Mei,Xiaolin Gu,Yang Lu,Chi-Cheng Fu,Ruifang Zhang,Ying Gu,Xia Chen,Xiaomao Luo,Ning Li,Baoyan Bai,Qiaoying Li,Jiping Yan,Zhai Hong,Ling Guan,Bing Gong,Keyang Zhao,Qu Fang,Chuan He

出处

期刊：Radiology [Radiological Society of North America]
日期：2023-06-01 卷期号：307 (5) 被引量：17

链接

nih.govdoi.org

标识

DOI：10.1148/radiol.221157

摘要

Background Artificial intelligence (AI) models have improved US assessment of thyroid nodules; however, the lack of generalizability limits the application of these models. Purpose To develop AI models for segmentation and classification of thyroid nodules in US using diverse data sets from nationwide hospitals and multiple vendors, and to measure the impact of the AI models on diagnostic performance. Materials and Methods This retrospective study included consecutive patients with pathologically confirmed thyroid nodules who underwent US using equipment from 12 vendors at 208 hospitals across China from November 2017 to January 2019. The detection, segmentation, and classification models were developed based on the subset or complete set of images. Model performance was evaluated by precision and recall, Dice coefficient, and area under the receiver operating characteristic curve (AUC) analyses. Three scenarios (diagnosis without AI assistance, with freestyle AI assistance, and with rule-based AI assistance) were compared with three senior and three junior radiologists to optimize incorporation of AI into clinical practice. Results A total of 10 023 patients (median age, 46 years [IQR 37–55 years]; 7669 female) were included. The detection, segmentation, and classification models had an average precision, Dice coefficient, and AUC of 0.98 (95% CI: 0.96, 0.99), 0.86 (95% CI: 0.86, 0.87), and 0.90 (95% CI: 0.88, 0.92), respectively. The segmentation model trained on the nationwide data and classification model trained on the mixed vendor data exhibited the best performance, with a Dice coefficient of 0.91 (95% CI: 0.90, 0.91) and AUC of 0.98 (95% CI: 0.97, 1.00), respectively. The AI model outperformed all senior and junior radiologists (P < .05 for all comparisons), and the diagnostic accuracies of all radiologists were improved (P < .05 for all comparisons) with rule-based AI assistance. Conclusion Thyroid US AI models developed from diverse data sets had high diagnostic performance among the Chinese population. Rule-based AI assistance improved the performance of radiologists in thyroid cancer diagnosis. © RSNA, 2023 Supplemental material is available for this article.

求助该文献

最长约 10秒，即可获得该文献文件

Generalizability and Diagnostic Performance of AI Models for Thyroid US

今日热心研友