作者
Nahid Ul Islam,Zongwei Zhou,Shiv Gehlot,Michael B. Gotway,Jianming Liang
摘要
Pulmonary Embolism (PE) represents a thrombus ("blood clot"), usually originating from a lower extremity vein, that travels to the blood vessels in the lung, causing vascular obstruction and in some patients death. This disorder is commonly diagnosed using Computed Tomography Pulmonary Angiography (CTPA). Deep learning holds great promise for the Computer-aided Diagnosis (CAD) of PE. However, numerous deep learning methods, such as Convolutional Neural Networks (CNN) and Transformer-based models, exist for a given task, causing great confusion regarding the development of CAD systems for PE. To address this confusion, we present a comprehensive analysis of competing deep learning methods applicable to PE diagnosis based on four datasets. First, we use the RSNA PE dataset, which includes (weak) slice-level and exam-level labels, for PE classification and diagnosis, respectively. At the slice level, we compare CNNs with the Vision Transformer (ViT) and the Swin Transformer. We also investigate the impact of self-supervised versus (fully) supervised ImageNet pre-training, and transfer learning over training models from scratch. Additionally, at the exam level, we compare sequence model learning with our proposed transformer-based architecture, Embedding-based ViT (E-ViT). For the second and third datasets, we utilize the CAD-PE Challenge Dataset and Ferdowsi University of Mashad's PE Dataset, where we convert (strong) clot-level masks into slice-level annotations to evaluate the optimal CNN model for slice-level PE classification. Finally, we use our in-house PE-CAD dataset, which contains (strong) clot-level masks. Here, we investigate the impact of our vessel-oriented image representations and self-supervised pre-training on PE false positive reduction at the clot level across image dimensions (2D, 2.5D, and 3D). Our experiments show that (1) transfer learning boosts performance despite differences between photographic images and CTPA scans; (2) self-supervised pre-training can surpass (fully) supervised pre-training; (3) transformer-based models demonstrate comparable performance but slower convergence compared with CNNs for slice-level PE classification; (4) model trained on the RSNA PE dataset demonstrates promising performance when tested on unseen datasets for slice-level PE classification; (5) our E-ViT framework excels in handling variable numbers of slices and outperforms sequence model learning for exam-level diagnosis; and (6) vessel-oriented image representation and self-supervised pre-training both enhance performance for PE false positive reduction across image dimensions. Our optimal approach surpasses state-of-the-art results on the RSNA PE dataset, enhancing AUC by 0.62% (slice-level) and 2.22% (exam-level). On our in-house PE-CAD dataset, 3D vessel-oriented images improve performance from 80.07% to 91.35%, a remarkable 11% gain. Codes are available at GitHub.com/JLiangLab/CAD_PE.