人工智能
计算机科学
高光谱成像
特征提取
模式识别(心理学)
变压器
计算机视觉
像素
作者
Xiaoyue Chen,Sei-Ichiro Kamata,Weilian Zhou
标识
DOI:10.1109/tencon54134.2021.9707289
摘要
Hyperspectral image classification (HSIC) is a task assigning the correct label to each pixel. It is a hot topic in the remote sensing field, which has been processed in several deep learning methods. Recently, there are some works that apply Vision Transformer (ViT) methods to the HSIC task, but the performance is not as good as some CNN-structured methods, considering that Vision Transformer uses attention to capture global information but ignores local characteristics. In this paper, a multi-stage Vision Transformer model referring to the feature extraction structure of CNN is proposed, and the result shows the realizability and reliability. Besides, experiments show that the modified ViT structure needs more samples for training. An innovative data augmentation method is used to generate extended samples with virtual yet reliable labels. The generated samples are combined with the original ones as the stacked samples, which are used for the following feature extraction process. Experiments explain the optimization of the multi-stage Vision Transformer structure with stacked samples in the accuracy term compared with other methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI