计算机科学
人工智能
保险丝(电气)
情态动词
计算机视觉
特征提取
模式识别(心理学)
目标检测
特征(语言学)
编码器
可视化
工程类
哲学
电气工程
化学
高分子化学
操作系统
语言学
作者
Jinyan Nie,He Sun,Xu Sun,Li Ni,Lianru Gao
出处
期刊:IEEE Geoscience and Remote Sensing Letters
[Institute of Electrical and Electronics Engineers]
日期:2023-12-04
卷期号:21: 1-5
被引量:1
标识
DOI:10.1109/lgrs.2023.3339214
摘要
Due to the complementarity of visible and infrared images, it has become more favorable to fuse these two modalities to improve the object detection accuracy in the remote sensing area. However, there are still some problems to be solved. Most of the existing algorithms focus too much on the local information and ignore long-range information when performing feature extraction on different modalities. Besides, coarse weighted fusion strategies do not fully utilize the information from different modalities, and the fusion structure ignores the importance of intermodal information exchange. To tackle these problems, a cross-modal feature fusion and interaction strategy for the convolutional neural network (CNN)-transformer-based object detection in visual and infrared remote sensing imagery is proposed. We adopt a parallel structure to extract the features of different modalities, separately. In visual and infrared modality, the convolutional layers and transformer encoders are cascaded to fully extract both local and long-range information. The cross-modal feature fusion and interaction module (CFFIM) adopts the attention mechanisms to jointly fuse different modal features at the same scale to improve the diversity of fused features, and the feature interaction enables the sharing of visible and infrared information. Experiments on the VEDAI dataset have demonstrated the effectiveness of the proposed scheme compared to other state-of-the-art algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI