M2FNet: Multi-modal fusion network for object detection from visible and thermal infrared images

人工智能情态动词计算机科学卷积神经网络模式识别（心理学）成对比较稳健性（进化）计算机视觉高光谱成像公制（单位）工程类材料科学高分子化学生物化学化学运营管理基因

作者

Chenchen Jiang,Huazhong Ren,Hong Yang,Hongtao Huo,Pengfei Zhu,Zhaoyuan Yao,Jing Li,Min Sun,Shihao Yang

出处

期刊：International journal of applied earth observation and geoinformation 日期：2024-05-23 卷期号：130: 103918-103918 被引量：12

链接

doi.orgdoi.org

标识

DOI：10.1016/j.jag.2024.103918

摘要

Fusing multi-modal information from visible (VIS) and thermal infrared (TIR) images is crucial for object detection in fully adapting to varied lighting conditions. However, the existing models usually treat VIS and TIR images as independent information and extract corresponding features from separate networks due to the scarcity of training data with labeled instances from both VIS and TIR registration images. To fill this gap, a novel Multi-Modal Fusion NETwork (M2FNet) based on the Transformer architecture is proposed in this paper, which contains two effective modules: the Union-Modal Attention (UMA) and the Cross-Modal Attention (CMA). The UMA module aggregates multi-spectral features from VIS and TIR images and then extracts multi-modal features via a convolutional neural network (CNN) backbone. The CMA module is designed to learn cross-attention features from VIS and TIR pairwise features by Transformer architecture. Evaluation results by the mean average precision (mAP) metric show that the M2FNet method significantly advances the baseline methods trained using only VIS or TIR images by 10.71 % and 2.97 %, respectively. The increments in mAP are observed in the M2FNet method compared with the existing multi-modal methods on two public datasets. Sensitivity analysis of eight illumination thresholds shows that the M2FNet method presents robustness performance on varied illumination conditions and achieves the maximum increase in accuracy of 25.6 %. Moreover, this method is subsequently applied to a new testing dataset, VI2DA (Visible-Infrared paired Video and Image DAtaset), observed by diverse sensors and platforms for testing the generalization ability of object detectors, which will be publicly available at https://github.com/TIR-OD/Datasets.

求助该文献

最长约 10秒，即可获得该文献文件

M2FNet: Multi-modal fusion network for object detection from visible and thermal infrared images

今日热心研友