计算机科学
编码器
人工智能
分割
模式识别(心理学)
图像分割
变压器
像素
计算机视觉
尺度空间分割
电压
量子力学
操作系统
物理
作者
Wang Bo,Fan Wang,Pengwei Dong,Chongyi Li
标识
DOI:10.1007/s11760-021-02115-w
摘要
Automatic medical image segmentation as assistance to doctors is important for diagnosis and treatment of various diseases. TransUNet that integrates the advantages of transformer and CNN has achieved success in medical image segmentation tasks. However, TransUNet simply combines feature maps between encoder and decoder via skip connections at the same resolution, which leads to be an unnecessarily restrictive fusion design. Moreover, the positional encoding and input tokens in standard transformer blocks of TransUNet have a fixed scale, which are not suitable for dense prediction. To alleviate the above problems, in this paper, we propose a novel architecture named multiscale TransUNet + + (MS-TransUNet + +), which employs a multiscale and flexible feature fusion scheme between encoder and decoder at different levels. The novel skip connections densely bridge the extracted feature representations with different resolutions, and the hybrid CNN-Transformer encoder with long-range dependencies directly passes the high-level features to each stage of decoder. Besides, in order to obtain more effective feature representations, an efficient multi-scale visual transformer is introduced for feature encoder. More importantly, we employ a weighted loss function composed of focal, multiscale structure similarity and Jaccard index to penalize the training error of medical image segmentation, jointly realizing pixel-level, patch-level and map-level optimization. Extensive experimental results demonstrate that our proposed multiscale TransUNet + + can achieve competitive performance for prostate MR and liver CT image segmentation.
科研通智能强力驱动
Strongly Powered by AbleSci AI