多光谱图像
计算机科学
人工智能
计算机视觉
目标检测
对象(语法)
模式识别(心理学)
作者
Sijie Hu,Fabien Bonardi,Samia Bouchafa,Helmut Prendinger,Désiré Sidibé
出处
期刊:IEEE Transactions on Intelligent Transportation Systems
[Institute of Electrical and Electronics Engineers]
日期:2024-06-19
卷期号:25 (11): 16300-16311
标识
DOI:10.1109/tits.2024.3412417
摘要
Data from different modalities, such as infrared and visible images, can offer complementary information, and integrating such information can significantly enhance the capabilities of a system to perceive and recognize its surroundings.Thus, multi-modal object detection has widespread applications, particularly in challenging weather conditions like low-light scenarios.The core of multi-modal fusion lies in developing a reasonable fusion strategy, which can fully exploit the complementary features of different modalities while preventing a significant increase in model complexity.To this end, this paper proposes a novel lightweight cross-fusion module named Channel-Patch Cross Fusion (CPCF), which leverages Channelwise Cross-Attention (CCA), Patch-wise Cross-Attention (PCA) and Adaptive Gating (AG) to encourage mutual rectification among different modalities.This process simultaneously explores commonalities across modalities while maintaining the uniqueness of each modality.Furthermore, we design a versatile intermediate fusion framework that can leverage CPCF to enhance the performance of multi-modal object detection.The proposed method is extensively evaluated on multiple public multi-modal datasets, namely FLIR, LLVIP, and DroneVehicle.The experiments indicate that our method yields consistent performance gains across various benchmarks and can be extended to different types of detectors, further demonstrating its robustness and generalizability.Our codes are available at https://github.com/Superjie13/CPCFMultispectral.
科研通智能强力驱动
Strongly Powered by AbleSci AI