作者
Jin Yu,Rui Tian,Yu Qian,Qiang Cai,Guoqing Chao,Danqing Liu,Yanhui Guo
摘要
Background: Pixel-level medical image segmentation tasks are challenging due to factors such as variable target scales, complex geometric shapes, and low contrast. Although U-shaped hybrid networks have demonstrated strong performance, existing models often fail to effectively integrate the local features captured by convolutional neural networks (CNNs) with the global features provided by Transformers. Moreover, their self-attention mechanisms often lack adequate emphasis on critical spatial and channel information. To address these challenges, our goal was to develop a hybrid deep learning model that can effectively and robustly segment medical images, including but not limited to computed tomography (CT) and magnetic resonance (MR) images. Methods: We propose an effective hybrid U-shaped network, named the effective multi-scale context aggregation hybrid network (EMCAH-Net). It integrates an effective multi-scale context aggregation (EMCA) block in the backbone, along with a dual-attention augmented self-attention (DASA) block embedded in the skip connections and bottleneck layers. Aimed at the characteristics of medical images, the former block focuses on fine-grained local multi-scale feature encoding, whereas the latter enhances global representation learning by adaptively combining spatial and channel attention with self-attention. This approach not only effectively integrates local multi-scale and global features but also reinforces skip connections, thereby highlighting segmentation targets and precisely delineating boundaries. The code is publicly available at https://github.com/AloneIsland/EMCAH-Net. Results: Compared to previous state-of-the-art (SOTA) methods, the EMCAH-Net achieves outstanding performance in medical image segmentation, with Dice similarity coefficient (DSC) scores of 84.73% (+2.85), 92.33% (+0.27), and 82.47% (+0.76) on the Synapse, automated cardiac diagnosis challenge (ACDC), and digital retinal images for vessel extraction (DRIVE) datasets, respectively. Additionally, it maintains computational efficiency in terms of model parameters and floating point operations (FLOPs). For instance, EMCAH-Net surpasses TransUNet on the Synapse dataset by 7.25% in DSC while requiring only 25% of the parameters and 71% of the FLOPs. Conclusions: EMCAH-Net has demonstrated significant advantages in segmenting multi-scale, small, and boundary-blurred features in medical images. Extensive experiments on abdominal multi-organ, cardiac, and retinal vessel medical segmentation tasks confirm that EMCAH-Net surpasses previous methods, including pure CNN, pure Transformer, and hybrid architectures.