计算机科学
编码器
卷积神经网络
变压器
人工智能
分割
深度学习
图像分割
模式识别(心理学)
计算机视觉
工程类
电压
操作系统
电气工程
作者
Hu Cao,Yueyue Wang,Joy Chen,Dongsheng Jiang,Xiaopeng Zhang,Qi Tian,Manning Wang
出处
期刊:Cornell University - arXiv
日期:2021-01-01
被引量:535
标识
DOI:10.48550/arxiv.2105.05537
摘要
In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis. Especially, the deep neural networks based on U-shaped architecture and skip-connections have been widely applied in a variety of medical image tasks. However, although CNN has achieved excellent performance, it cannot learn global and long-range semantic information interaction well due to the locality of the convolution operation. In this paper, we propose Swin-Unet, which is an Unet-like pure Transformer for medical image segmentation. The tokenized image patches are fed into the Transformer-based U-shaped Encoder-Decoder architecture with skip-connections for local-global semantic feature learning. Specifically, we use hierarchical Swin Transformer with shifted windows as the encoder to extract context features. And a symmetric Swin Transformer-based decoder with patch expanding layer is designed to perform the up-sampling operation to restore the spatial resolution of the feature maps. Under the direct down-sampling and up-sampling of the inputs and outputs by 4x, experiments on multi-organ and cardiac segmentation tasks demonstrate that the pure Transformer-based U-shaped Encoder-Decoder network outperforms those methods with full-convolution or the combination of transformer and convolution. The codes and trained models will be publicly available at https://github.com/HuCaoFighting/Swin-Unet.
科研通智能强力驱动
Strongly Powered by AbleSci AI