计算机科学
目标检测
人工智能
融合
RGB颜色模型
计算机视觉
对象(语法)
传感器融合
红外线的
模式识别(心理学)
光学
语言学
物理
哲学
标识
DOI:10.1109/lgrs.2024.3448493
摘要
In recent years, object detection on visible (RGB) and infrared (IR) has gained significant attention as a promising solution for robust detection in complex scenarios, especially in low-light conditions. With the help of IR images, object detectors have become more reliable and robust in practical condition by combining the RGB and IR information. Despite significant progress in this field, current methods ignore the distinct characteristics of the two modalities when extracting features. RGB images contain detailed texture and color information, which means they have many high-frequency signals. Meanwhile, IR images have smoother textures and edges but clear shapes, indicating a significant amount of low-frequency information. We must consider the differences between the two modalities when extracting corresponding features. To address this issue, we propose a novel network architecture: the frequency mining and complementary fusion network (FMCFNet), which accounts for the intermodal variability. Our network contains two critical modules: the frequency feature extraction (FFE) module and the complementary fusion (CF) module. The FFE module utilizes filters of varying kernel and pooling sizes to extract features with diverse frequency information and then adaptively selects the most responsive frequency component. The CF module uses the similarity scores generated by cross attention to model the interactions between two modalities. Comprehensive experimental results demonstrate that our method can effectively combine RGB-IR complementary information, achieving robust detection results.
科研通智能强力驱动
Strongly Powered by AbleSci AI