符号
像素
算法
计算机科学
人工智能
数学
算术
作者
Yan Zhang,Xiyuan Gao,Qingyan Duan,Jiaxu Leng,Xiao Pu,Xinbo Gao
出处
期刊:IEEE transactions on neural networks and learning systems
[Institute of Electrical and Electronics Engineers]
日期:2023-10-04
卷期号:: 1-15
被引量:3
标识
DOI:10.1109/tnnls.2023.3319363
摘要
Very high-resolution (VHR) remote sensing (RS) image classification is the fundamental task for RS image analysis and understanding. Recently, Transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution ( ≈ 224 × 224 pixels) and achieved remarkable results on general image classification tasks. However, the complexity of the naive Transformer grows quadratically with the increase in image size, which prevents Transformer-based models from VHR RS image ( ≥ 500 × 500 pixels) classification and other computationally expensive downstream tasks. To this end, we propose to decompose the expensive self-attention (SA) into real and imaginary parts via discrete Fourier transform (DFT) and, therefore, propose an efficient complex SA (CSA) mechanism. Benefiting from the conjugated symmetric property of DFT, CSA is capable to model the high-order contextual information with less than half computations of naive SA. To overcome the gradient explosion in Fourier complex field, we replace the Softmax function with the carefully designed Logmax function to normalize the attention map of CSA and stabilize the gradient propagation. By stacking various layers of CSA blocks, we propose the Fourier complex Transformer (FCT) model to learn global contextual information from VHR aerial images following the hierarchical manners. Universal experiments conducted on commonly used RS classification datasets demonstrate the effectiveness and efficiency of FCT, especially on VHR RS images. The source code of FCT will be available at https://github.com/Gao-xiyuan/FCT.
科研通智能强力驱动
Strongly Powered by AbleSci AI