Crisscross-Global Vision Transformers Model for Very High Resolution Aerial Image Semantic Segmentation
计算机科学
人工智能
分割
高分辨率
图像分割
计算机视觉
遥感
航空影像
地质学
图像(数学)
作者
Guohui Deng,Zhaocong Wu,Miaozhong Xu,Chengjun Wang,Zhiye Wang,Zhongyuan Lu
出处
期刊:IEEE Transactions on Geoscience and Remote Sensing [Institute of Electrical and Electronics Engineers] 日期:2023-01-01卷期号:61: 1-19被引量:3
标识
DOI:10.1109/tgrs.2023.3276172
摘要
Semantic segmentation is a key means for understanding very-high resolution (VHR) aerial imagery. With the explosive development of deep learning, deep learning methods are being applied to the segmentation of VHR images, with convolutional neural networks (CNNs) as the basic framework. However, owing to the highly complex details present in VHR images and the high spatial dependence of geographical objects, CNN-based methods are inadequate. This is because the inherent locality of CNNs limits the size of the receptive field, thus limiting the ability to obtain long-range context information. To solve this problem, in this paper, we propose a transformer-based novel deep learning model called crisscross-global vision transformers (CGVT). CGVT exploits the transformer's inherent ability to obtain long-range context information to solve the restricted receptive field problem. Specifically, we redesign the self-attention mechanism in the transformer and call it crisscross-global attention. It consists of two parts: crisscross transformer encoder block (CC-TEB) and global squeeze transformer encoder block (GS-TEB). CC-TEB overcomes the limitation of the traditional self-attention design (specifically, difficulty applying it to VHR aerial image segmentation) and further increases the local feature representation ability of the model. GS-TEB increases the global feature representation ability of the model. The results of experiments conducted on the popular ISPRS Vaihingen, IEEE GRSS Data Fusion Contest Zeebrugge, and LoveDA Semantic Segmentation Challenge datasets verify the effectiveness and superiority of our proposed method. Specifically, it achieved state-of-the-art performance on both Zeebrugge and LoveDA datasets, and is currently ranked second in Vaihingen dataset.