计算机科学
人工智能
分割
编码器
模式识别(心理学)
变压器
图像分割
卷积神经网络
深度学习
特征(语言学)
计算机视觉
电压
物理
操作系统
哲学
量子力学
语言学
作者
Shixiang Zhang,Yang Xu,Zebin Wu,Zhihui Wei
标识
DOI:10.1007/978-3-031-47637-2_21
摘要
In recent years, the Vision Transformer has gradually replaced the CNN as the mainstream method in the field of medical image segmentation due to its powerful long-range dependencies modeling ability. However, the segmentation network leveraging pure transformer performs poor in feature expression because of the lack of convolutional locality. Besides, the channel dimension information are lost in the network. In this paper, we propose a novel segmentation network termed CTC-Net to address these problems. Specifically, we design a feature-enhanced transformer module with spatial-reduction attention to extract the region details in the image patches by the depth-wise convolution. Then, the point-wise convolution is leveraged to capture non-linear relationship in the channel dimension. Furthermore, a parallel convolutional encoder branch and an inverted residual coordinate attention block are designed to mine the clear dependencies of local context, channel dimension features and location information. Extensive experiments on Synapse Multi-organ CT and ACDC (Automatic Cardiac Diagnosis Challenge) datasets show that our method outperforms the methods based on CNN and pure transformers, obtaining up to 1.72 $$\%$$ and 0.68 $$\%$$ improvement in DSC scores respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI