A Deep Learning Model to Triage Screening Mammograms: A Simulation Study

医学急诊分诊台乳腺摄影术医学物理学乳腺X光筛查深度学习放射科人工智能医疗急救内科学癌症乳腺癌计算机科学

作者

Adam Yala,Tal Schuster,Randy C. Miles,Regina Barzilay,Constance D. Lehman

出处

期刊：Radiology [Radiological Society of North America]
日期：2019-08-06 卷期号：293 (1): 38-46 被引量：172

链接

nih.govdoi.org

标识

DOI：10.1148/radiol.2019182908

摘要

Background Recent deep learning (DL) approaches have shown promise in improving sensitivity but have not addressed limitations in radiologist specificity or efficiency. Purpose To develop a DL model to triage a portion of mammograms as cancer free, improving performance and workflow efficiency. Materials and Methods In this retrospective study, 223 109 consecutive screening mammograms performed in 66 661 women from January 2009 to December 2016 were collected with cancer outcomes obtained through linkage to a regional tumor registry. This cohort was split by patient into 212 272, 25 999, and 26 540 mammograms from 56 831, 7021, and 7176 patients for training, validation, and testing, respectively. A DL model was developed to triage mammograms as cancer free and evaluated on the test set. A DL-triage workflow was simulated in which radiologists skipped mammograms triaged as cancer free (interpreting them as negative for cancer) and read mammograms not triaged as cancer free by using the original interpreting radiologists' assessments. Sensitivities, specificities, and percentage of mammograms read were calculated, with and without the DL-triage-simulated workflow. Statistics were computed across 5000 bootstrap samples to assess confidence intervals (CIs). Specificities were compared by using a two-tailed t test (P < .05) and sensitivities were compared by using a one-sided t test with a noninferiority margin of 5% (P < .05). Results The test set included 7176 women (mean age, 57.8 years ± 10.9 [standard deviation]). When reading all mammograms, radiologists obtained a sensitivity and specificity of 90.6% (173 of 191; 95% CI: 86.6%, 94.7%) and 93.5% (24 625 of 26 349; 95% CI: 93.3%, 93.9%). In the DL-simulated workflow, the radiologists obtained a sensitivity and specificity of 90.1% (172 of 191; 95% CI: 86.0%, 94.3%) and 94.2% (24 814 of 26 349; 95% CI: 94.0%, 94.6%) while reading 80.7% (21 420 of 26 540) of the mammograms. The simulated workflow improved specificity (P = .002) and obtained a noninferior sensitivity with a margin of 5% (P < .001). Conclusion This deep learning model has the potential to reduce radiologist workload and significantly improve specificity without harming sensitivity. © RSNA, 2019 Online supplemental material is available for this article. See also the editorial by Kontos and Conant in this issue.

求助该文献

最长约 10秒，即可获得该文献文件

A Deep Learning Model to Triage Screening Mammograms: A Simulation Study

今日热心研友