医学
全国肺筛查试验
背景(考古学)
深度学习
肺癌
放射科
人工智能
人口
肺癌筛查
癌症
计算机科学
计算机断层摄影术
机器学习
内科学
古生物学
环境卫生
生物
作者
Stojan Trajanovski,Dimitrios Mavroeidis,Christine Leon Swisher,Binyam Gebrekidan Gebre,Bastiaan S. Veeling,Rafael Wiemker,Tobias Klinder,Amir Tahmasebi,Shawn M. Regis,Christoph Wald,Brady J. McKee,Sebastian Flacke,Heber MacMahon,Homer Pien
标识
DOI:10.1016/j.compmedimag.2021.101883
摘要
Lung cancer is the leading cause of cancer mortality in the US, responsible for more deaths than breast, prostate, colon and pancreas cancer combined and large population studies have indicated that low-dose computed tomography (CT) screening of the chest can significantly reduce this death rate. Recently, the usefulness of Deep Learning (DL) models for lung cancer risk assessment has been demonstrated. However, in many cases model performances are evaluated on small/medium size test sets, thus not providing strong model generalization and stability guarantees which are necessary for clinical adoption. In this work, our goal is to contribute towards clinical adoption by investigating a deep learning framework on larger and heterogeneous datasets while also comparing to state-of-the-art models.Three low-dose CT lung cancer screening datasets were used: National Lung Screening Trial (NLST, n = 3410), Lahey Hospital and Medical Center (LHMC, n = 3154) data, Kaggle competition data (from both stages, n = 1397 + 505) and the University of Chicago data (UCM, a subset of NLST, annotated by radiologists, n = 132). At the first stage, our framework employs a nodule detector; while in the second stage, we use both the image context around the nodules and nodule features as inputs to a neural network that estimates the malignancy risk for the entire CT scan. We trained our algorithm on a part of the NLST dataset, and validated it on the other datasets. Special care was taken to ensure there was no patient overlap between the train and validation sets.The proposed deep learning model is shown to: (a) generalize well across all three data sets, achieving AUC between 86% to 94%, with our external test-set (LHMC) being at least twice as large compared to other works; (b) have better performance than the widely accepted PanCan Risk Model, achieving 6 and 9% better AUC score in our two test sets; (c) have improved performance compared to the state-of-the-art represented by the winners of the Kaggle Data Science Bowl 2017 competition on lung cancer screening; (d) have comparable performance to radiologists in estimating cancer risk at a patient level.
科研通智能强力驱动
Strongly Powered by AbleSci AI