计算机科学
编码器
变压器
卷积神经网络
人工智能
分割
深度学习
图像分割
模式识别(心理学)
计算机视觉
电压
量子力学
操作系统
物理
作者
Hu Cao,Yueyue Wang,Joy Chen,Dongsheng Jiang,Xiaopeng Zhang,Qi Tian,Manning Wang
标识
DOI:10.1007/978-3-031-25066-8_9
摘要
In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis. In particular, deep neural networks based on U-shaped architecture and skip-connections have been widely applied in various medical image tasks. However, although CNN has achieved excellent performance, it cannot learn global semantic information interaction well due to the locality of convolution operation. In this paper, we propose Swin-Unet, which is an Unet-like pure Transformer for medical image segmentation. The tokenized image patches are fed into the Transformer-based U-shaped Encoder-Decoder architecture with skip-connections for local-global semantic feature learning. Specifically, we use a hierarchical Swin Transformer with shifted windows as the encoder to extract context features. And a symmetric Swin Transformer-based decoder with a patch expanding layer is designed to perform the up-sampling operation to restore the spatial resolution of the feature maps. Under the direct down-sampling and up-sampling of the inputs and outputs by $$4{\times }$$ , experiments on multi-organ and cardiac segmentation tasks demonstrate that the pure Transformer-based U-shaped Encoder-Decoder network outperforms those methods with full-convolution or the combination of transformer and convolution. The codes have been publicly available at the link ( https://github.com/HuCaoFighting/Swin-Unet ).
科研通智能强力驱动
Strongly Powered by AbleSci AI