计算机科学
特征提取
人工智能
卷积神经网络
变压器
模式识别(心理学)
电压
量子力学
物理
作者
Renhe Zhang,Zhechun Wan,Qian Zhang,Guixu Zhang
出处
期刊:IEEE Geoscience and Remote Sensing Letters
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:20: 1-5
被引量:8
标识
DOI:10.1109/lgrs.2023.3304377
摘要
Both local and global context dependencies are essential for building extraction from remote sensing (RS) images. Convolutional Neural Network (CNN) can extract local spatial details well but lacks the ability to model long-range dependency. In recent years, Vision Transformer (ViT) have shown great potential in modeling global context dependency. However, it usually brings huge computational cost, and spatial details can not be fully retained in the process of feature extraction. To maximize the advantages of CNNs and ViTs, we propose DSAT-Net, which combine them in one model. In DSAT-Net, we design an efficient Dual Spatial Attention Transformer (DSAFormer) to solve the defects of standard ViT. It has a dual attention structure to complement each other. Specifically, the global attention path (GAP) conducts a large scale down sampling of the feature maps before the global self-attention computing, to reduce the computational cost. The local attention path (LAP) uses efficient stripe convolution to generate local attention, which can alleviate the loss of information caused by down-sampling operation in the GAP and supplement the spatial details. In addition, we design a feature refining module called Channel Mixing Feature Refine Module (CM-FRM) to fuse low-level and high-level features. Our model achieved competitive results on three public building extraction datasets. Code will be available at: https://github.com/stdcoutzrh/BuildingExtraction.
科研通智能强力驱动
Strongly Powered by AbleSci AI