计算机科学
编码器
人工智能
卷积神经网络
分割
变压器
模式识别(心理学)
计算机视觉
量子力学
操作系统
物理
电压
作者
Honglin Wu,Peng Huang,Min Zhang,Wenlong Tang,Xinyu Yu
出处
期刊:IEEE Transactions on Geoscience and Remote Sensing
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:61: 1-12
被引量:35
标识
DOI:10.1109/tgrs.2023.3314641
摘要
Convolutional neural networks (CNNs) are powerful in extracting local information but lack the ability to model long-range dependencies. In contrast, transformer relies on multihead self-attention mechanisms to effectively extract the global contextual information and thus model long-range dependencies. In this paper, we propose a novel encoder-decoder structured semantic segmentation network, named as CNN and multiscale transformer fusion network (CMTFNet), to extract and fuse local information and multiscale global contextual information of high-resolution remote sensing images. Specifically, to further process the output features from the CNN encoder, we build a transformer decoder based on the multiscale multihead self-attention (M2SA) module for extracting rich multiscale global contextual information and channel information. Additionally, the transformer block introduces an efficient feed-forward network (E-FFN) to enhance the information interaction between different channels of the feature. Finally, the multiscale attention fusion (MAF) module fully fuses the feature information from different levels. We have conducted extensive comparison experiments and ablation experiments on the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam datasets. The extensive experimental results demonstrate that our proposed CMTFNet can obtain superior performance compared to the currently popular methods. The codes will be available at https://github.com/DrWuHonglin/CMTFNet.
科研通智能强力驱动
Strongly Powered by AbleSci AI