摘要
Abstract Background The morbidity of pulmonary embolism (PE) is only lower than that of coronary heart disease and hypertension. Early detection, early diagnosis, and timely treatment are the keys to effectively reduce the risk of death. Nevertheless, PE segmentation is still a challenging task at present. The automatic segmentation of PE is particularly important. On the one hand, manual segmentation of PE from a computed tomography (CT) sequence is very time‐consuming and prone to misdiagnose. On the other hand, an accurate contour of the location, volume, and shape of PE can help radiotherapists carry out targeted treatment and thus greatly increase the survival rate of patients. Therefore, developing an automatic and efficient PE segmentation approach is an urgent demand in clinical diagnosis. Purpose An accurate segmentation of PE is critical for the diagnosis of PE. However, it remains a difficult and relevant problem in the field of medical image processing due to factors like incongruent sizes and shapes of emboli regions, and low contrast between embolisms and other tissues. To address this conundrum, in this study, a deep neural network (CAM‐Wnet) that incorporates coordinate attention (CA) mechanisms and pyramid pooling modules (PPMs) is proposed to end‐to‐end segment PE from CT image. Methods CAM‐Wnet architecture is composed of coarse U‐Net and subdivision U‐Net stacked on top of each other. First, the coarse U‐Net uses a pretrained VGG‐19 as an encoder, which can transfer the features learned from ImageNet to other tasks. At the same time, CA residual blocks (CARBs) are introduced into the decoder of the coarse network to obtain a wider range of semantic information and find out the correlation between channels. Then, the multiplied results of input image and preliminary mask are put into the subdivision U‐Net for secondary feature distillation, and the encoder and decoder of the subdivision U‐Net are both constructed from CARBs, too. The PPMs are used between the encoder and the decoder of two U‐Net architectures to utilize global context information and further enhance the feature extraction effect. Finally, the improved focal loss function is used to train the network to further improve the segmentation effect. Results In this study, we used the doctors’ manual contours of the China‐Japan Friendship Hospital dataset to test the proposed architecture. We calculated the Precision, Recall, IoU, and F 1‐score to evaluate the accuracy of the architecture for PE segmentation. The segmentation Precision for PE was found to be 0.9703, Recall was 0.963, IoU was 0.9353, and F 1‐score was 0.9665. The experimental results show the effectiveness of the proposed method to automatically and accurately segment embolism in lung CT images. Furthermore, we also test the performance of our method on the liver tumor segmentation public dataset, which demonstrates the effectiveness and generalization ability of our method. Conclusions CAM‐Wnet obtained more global information and semantic information with the introduction of multiscale pooling and attention mechanisms. Experimental results showed that the proposed method effectively improved the segmentation effect of PE in lung CT images and could be applied to assist doctors in clinical treatment.