SSNet: A Novel Transformer and CNN Hybrid Network for Remote Sensing Semantic Segmentation

计算机科学分割人工智能变压器模式识别（心理学）图像分割计算机视觉自然语言处理电气工程工程类电压

作者

Min Yao,Y. H. Zhang,Guofeng Liu,Dongdong Pang

出处

期刊：IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing [Institute of Electrical and Electronics Engineers]
日期：2024-01-01 卷期号：17: 3023-3037 被引量：11

链接

ieee.org ieee.orgdoi.org

标识

DOI：10.1109/jstars.2024.3349657

摘要

There are still various challenges in remote sensing semantic segmentation due to objects diversity and complexity. Transformer-based models have achieved encouraging results in semantic segmentation, which has significant advantages in capturing global feature dependencies. However it unfortunately ignores local feature details. On the other hand, Convolutional Neural Network (CNN), with a different interaction mechanism from Transformer-based models, captures more small-scale local features, but experiences a difficulty to capture global features. In this paper, a new semantic segmentation net framework named SSNet is proposed, which incorporates an encoder-decoder structure, optimizing the advantages of both local and global features. In addition, we build Feature Fuse Module(FFM) and Feature Inject Module(FIM) to largely fuse these two-style features. The former module captures the dependencies between different positions and channels to extract multi-scale features, which promotes the segmentation precision on similar objects. The latter module condenses the global information in Transformer and injects it into CNN to obtain a broad global field of view, in which the depth-wise strip convolution improves the segmentation accuracy on tiny objects. A CNN-based decoder progressively recovers the feature map size, and a block called atrous spatial pyramid pooling (ASPP) is adopted in decoder to obtain a multi-scale context. The skip connection is established between the decoder and the encoder, which retains important feature information of the shallow layer network and is conducive to achieving flow of multi-scale features. To evaluate our model, we compares it with current state-of-the-art models on WHDLD and Potsdam datasets. The experimental results indicate that our proposed model achieves more precise semantic segmentation. The code of this work can be downloaded at https://github.com/stu-yzZ/SSNet .

求助该文献

最长约 10秒，即可获得该文献文件

SSNet: A Novel Transformer and CNN Hybrid Network for Remote Sensing Semantic Segmentation

今日热心研友