计算机科学
编码器
变压器
分割
人工智能
图像分割
计算机工程
计算机视觉
模式识别(心理学)
工程类
电气工程
电压
操作系统
作者
Peixu Wang,Shikun Liu,Jialin Peng
标识
DOI:10.1109/icpr56361.2022.9956705
摘要
Encoder-Decoder networks based on local convolutions have shown state-of-the-art results on various medical image segmentation tasks. However, they have limited ability to capture long-range spatial contexts, which has intrigued the development of Transformers with attention mechanisms. Despite their success, Transformers usually have limitations in processing huge medical image volume data due to their high computational complexity and relying on large-scale pre-training. Hence, we introduce a hybrid encoder-decoder, which utilizes both lightweight convolution modules and an axial-spatial transformer (AST) module in the encoder. To capture better multi-view and multi-scale features, we intergrade axial and spatial attention in the AST module to learn long-range dependencies. Meanwhile, convolution operations extract local dependencies and rich local features. Compared to pure vision transformers, the hybrid model has much fewer learnable parameters, which is desirable for clinical usage. The experimental results on three challenging benchmarks demonstrate the competitive performance of the proposed model over the state of the arts.
科研通智能强力驱动
Strongly Powered by AbleSci AI