计算机科学
编码器
块(置换群论)
变压器
分割
人工智能
模式识别(心理学)
计算机视觉
数学
物理
几何学
量子力学
电压
操作系统
作者
Renhe Zhang,Qian Zhang,Guixu Zhang
出处
期刊:IEEE Geoscience and Remote Sensing Letters
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:20: 1-5
被引量:15
标识
DOI:10.1109/lgrs.2023.3270303
摘要
Benefiting from effective global information interaction, vision-transformers (ViTs) have been widely used in the building extraction task. However, buildings in remote sensing (RS) images usually differ greatly in size. Mainstream ViT-based segmentation models for RS images are based on Swin Transformer, which lacks multi-scale information inside the ViT block. In addition, they only connect the output of the entire ViT encoder block to the decoder, which ignore the similarity information of the attention maps inside the ViT encoder block, and are unable to provide better global dependencies for the decoder. To solve above problems, we introduce a novel Shunted Transformer, which enables the model to capture multi-scale information internally while fully establishing global dependencies, to build a pure ViT-based U-shaped model for building extraction. Furthermore, unlike the previous single-skip-connection structure of U-shaped methods, we build a novel dual skip connection structure inside the model. It simultaneously transmits the attention maps inside the ViT encoder block and its entire output to the decoder, thereby fully mining the information of the ViT encoder block and providing better global information guidance for the decoder. Thus, our model is named Shunted Dual Skip Connection UNet (SDSC-UNet). We also design a feature fusion module called Dual Skip Upsample Fusion Module (DSUFM) to aggregate the information. Our model has yields state-of-the-art (SOTA) performance (83.02%IoU) on the Inria Aerial Image Labeling Dataset. Code will be available.
科研通智能强力驱动
Strongly Powered by AbleSci AI