计算机科学
卷积神经网络
人工智能
模式识别(心理学)
嵌入
变压器
像素
高光谱成像
特征学习
上下文图像分类
图像(数学)
量子力学
物理
电压
作者
Bing Tu,Xiaolong Liao,Qianming Li,Yishu Peng,Antonio Plaza
出处
期刊:IEEE Transactions on Geoscience and Remote Sensing
[Institute of Electrical and Electronics Engineers]
日期:2022-01-01
卷期号:60: 1-15
被引量:29
标识
DOI:10.1109/tgrs.2022.3201145
摘要
Hyperspectral images (HSIs) contain abundant information in the spatial and spectral domains, allowing for a precise characterization of categories of materials. Convolutional neural networks (CNNs) have achieved great success in HSI classification, owing to their excellent ability in local contextual modeling. However, CNNs suffer from fixed filter weights and deep convolutional layers, which lead to a limited receptive field and high computational burden. The recent Vision Transformer (ViT) models long-range dependencies with a self-attention mechanism and has been an alternative backbone to the CNNs traditionally used in HSI classification. However, such transformer-based architectures designate all input pixels of the receptive field as feature tokens in terms of feature embedding and self-attention, which inevitably limits the ability for learning multi-scale features and increases the computational cost. To overcome this issue, we propose a local semantic feature aggregation-based transformer (LSFAT) architecture which allows transformers to represent long-range dependencies of multi-scale features more efficiently. We introduce the concept of the homogeneous region into the transformer by considering a pixel aggregation strategy and further propose neighborhood aggregation-based embedding (NAE) and attention (NAA) modules, which are able to adaptively form multi-scale features and capture locally spatial semantics among them in a hierarchical transformer architecture. A reusable classification token is included together with the feature tokens in the attention calculation. In the last stage, a fully connected layer is employed to perform classification on the reusable token after transformer encoding. We verify the effectiveness of the NAE and NAA modules compared with the traditional ViT through extensive experiments. Our results demonstrate the excellent classification performance of the proposed method in comparison with other state-of-the-art approaches on several public HSIs.
科研通智能强力驱动
Strongly Powered by AbleSci AI