Local Semantic Feature Aggregation-Based Transformer for Hyperspectral Image Classification

计算机科学卷积神经网络人工智能模式识别（心理学）嵌入变压器像素高光谱成像特征学习上下文图像分类图像（数学）量子力学物理电压

作者

Bing Tu,Xiaolong Liao,Qianming Li,Yishu Peng,Antonio Plaza

出处

期刊：IEEE Transactions on Geoscience and Remote Sensing [Institute of Electrical and Electronics Engineers]
日期：2022-01-01 卷期号：60: 1-15 被引量：29

标识

DOI：10.1109/tgrs.2022.3201145

摘要

Hyperspectral images (HSIs) contain abundant information in the spatial and spectral domains, allowing for a precise characterization of categories of materials. Convolutional neural networks (CNNs) have achieved great success in HSI classification, owing to their excellent ability in local contextual modeling. However, CNNs suffer from fixed filter weights and deep convolutional layers, which lead to a limited receptive field and high computational burden. The recent Vision Transformer (ViT) models long-range dependencies with a self-attention mechanism and has been an alternative backbone to the CNNs traditionally used in HSI classification. However, such transformer-based architectures designate all input pixels of the receptive field as feature tokens in terms of feature embedding and self-attention, which inevitably limits the ability for learning multi-scale features and increases the computational cost. To overcome this issue, we propose a local semantic feature aggregation-based transformer (LSFAT) architecture which allows transformers to represent long-range dependencies of multi-scale features more efficiently. We introduce the concept of the homogeneous region into the transformer by considering a pixel aggregation strategy and further propose neighborhood aggregation-based embedding (NAE) and attention (NAA) modules, which are able to adaptively form multi-scale features and capture locally spatial semantics among them in a hierarchical transformer architecture. A reusable classification token is included together with the feature tokens in the attention calculation. In the last stage, a fully connected layer is employed to perform classification on the reusable token after transformer encoding. We verify the effectiveness of the NAE and NAA modules compared with the traditional ViT through extensive experiments. Our results demonstrate the excellent classification performance of the proposed method in comparison with other state-of-the-art approaches on several public HSIs.

求助该文献

最长约 10秒，即可获得该文献文件

Local Semantic Feature Aggregation-Based Transformer for Hyperspectral Image Classification

今日热心研友