计算机科学
增采样
卷积神经网络
分割
人工智能
编码器
地点
图像分割
失败
模式识别(心理学)
图像(数学)
并行计算
语言学
操作系统
哲学
作者
Zhiwei Liang,Zhao Kui,Gang Liang,Siyu Li,Yifei Wu,Yiping Zhou
标识
DOI:10.1016/j.knosys.2023.110987
摘要
Convolutional neural networks(CNN), especially U-shaped networks, have become the mainstream approach for medical image segmentation. However, due to the intrinsic locality of convolutional operations, CNN has inherent limitations in capturing long-range dependencies. Although Transformer-based methods have demonstrated remarkable performance in computer vision by modeling long-range dependencies, their high computational complexity and reliance on large-scale pre-training present challenges, particularly for higher-resolution medical images. In this paper, we introduce MAXFormer, a U-shaped hierarchical network that effectively leverages global context within individual samples and relationships between different samples. Our Transformer module reformulates the self-attention mechanism into two parts: local–global attention and external attention. The local–global attention provides an efficient alternative to self-attention with linear complexity, employing a parallel architecture that allows local–global spatial interactions. The local attention branch captures high-frequency local information, while the global attention branch captures low-frequency global information. Furthermore, we have designed the Refined Fused Connection module to effectively merge feature outputs from each encoder block with the decoder output, mitigating spatial detail loss due to downsampling. Extensive experiments on two different medical image segmentation datasets show that our proposed method outperforms other state-of-the-art methods without requiring pre-training weights. Code will be available at https://github.com/zhiwei-liang/MAXFormer.
科研通智能强力驱动
Strongly Powered by AbleSci AI