计算机科学
编码器
分割
人工智能
卷积神经网络
变压器
增采样
深度学习
判别式
特征学习
模式识别(心理学)
计算机视觉
图像(数学)
物理
量子力学
电压
操作系统
作者
Dong Ren,Falin Li,Hang Sun,Li Liu,Ren Shun,Mei Yu
标识
DOI:10.1080/01431161.2023.2292550
摘要
Semantic segmentation of remote sensing images is crucial for various practical applications. In the field of deep learning, convolutional neural network (CNN) has been the primary approach for semantic segmentation over the past decade. Recently, Transformer-based models have achieved superior segmentation performance due to their exceptional global modelling capabilities. However, the Transformer-based models tend to focus more on extracting global contextual information, leading to suboptimal performance in segmenting local edges and difficulties in preserving fine-grained details during the patch token downsampling process. Inspired by the local receptive field of CNN, this article proposes a Local-Enhanced Multi-Scale Aggregation Swin Transformer (LMA-Swin) for semantic segmentation of high-resolution remote sensing images. Specifically, we adopt Swin Transformer as main encoder, introduce convolutional blocks as auxiliary encoder, and design a feature modulation module (FMM) to integrate the local contextual modelling ability of CNN into the Transformer backbone. Additionally, we propose a novel cross-aggregation decoder (CAD) to effectively aggregate shallow edge information and deep semantic information, thereby enhancing the discriminative ability for multi-scale objects. On the ISPRS Vaihingen and Potsdam datasets, experimental results illustrate noteworthy improvement in segmentation performance accomplished through the proposed approach. Code: https://github.com/patricklee16/LMA-Swin.
科研通智能强力驱动
Strongly Powered by AbleSci AI