计算机科学
分割
人工智能
图像分割
计算机视觉
比例(比率)
变压器
遥感
模式识别(心理学)
地理
地图学
量子力学
物理
电压
作者
Honglin Wu,Min Zhang,Peng Huang,Wenlong Tang
标识
DOI:10.1109/jstars.2024.3375313
摘要
The characteristics of remote sensing images, such as complex ground objects, rich feature details, large intra-class variance and small inter-class variance, usually require deep learning semantic segmentation methods to have strong feature learning representation ability. Due to the limitation of convolutional operation, Convolutional Neural Networks (CNNs) are good at capturing local details, but perform poorly at modelling long-range dependencies. Transformers rely on multi-head selfattention mechanisms to extract global contextual information, but it usually leads to high complexity. Therefore, this paper proposes CNN and Multi-scale Local-context Transformer network (CMLFormer), a novel encoder-decoder structured network for remote sensing image semantic segmentation. Specifically, for the features extracted by the lightweight ResNet18 encoder, we design a transformer decoder based on Multi-scale Local-context Transform Block (MLTB) to enhance the ability of feature learning. By using a self-attention mechanism with non-overlapping windows and with the help of multi-scale horizontal and vertical interactive stripe convolution, MLTB is able to capture both local feature information and global feature information at different scales with low complexity. Additionally, the Feature Enhanced Module (FEM) is introduced into the encoder to further facilitate the learning of global and local information. Experimental results show that our proposed CMLFormer exhibits excellent performance on the Vaihingen and Potsdam datasets. The code is available at https://github.com/DrWuHonglin/CMLFormer .
科研通智能强力驱动
Strongly Powered by AbleSci AI