人工智能
计算机科学
计算机视觉
RGB颜色模型
棱锥(几何)
变压器
特征(语言学)
模式识别(心理学)
情态动词
目标检测
特征提取
模式
工程类
数学
社会学
高分子化学
电气工程
社会科学
语言学
哲学
化学
几何学
电压
作者
Yaohui Zhu,Xiaoyu Sun,Miao Wang,Hua Huang
出处
期刊:IEEE Transactions on Intelligent Transportation Systems
[Institute of Electrical and Electronics Engineers]
日期:2023-04-19
卷期号:24 (9): 9984-9995
被引量:10
标识
DOI:10.1109/tits.2023.3266487
摘要
RGB-Infrared multi-modal object detection utilizes diverse and complementary information, showing some advantages in intelligent transportation field. The main challenge of RGB-Infrared object detection is how to fuse the two modalities. The difficulty of fusion is reflected in two aspects: 1) large visual differences between modalities make it difficult to learn effective complementary features, 2) some misaligned RGB-Infrared images increase the difficulty of fusion. To this end, based on feature pyramid commonly used in object detection, we propose Multi-modal Feature Pyramid Transformer (MFPT) to fuse the two modalities. The proposed MFPT learns semantic and modal complementary information to enhance each modal features via intra-modal feature pyramid transformer and inter-modal feature pyramid transformer. The intra-modal feature pyramid transformer enables features to interact across space and scales, improving the semantic representations of features in each modality. The inter-modal feature pyramid transformer conducts feature interaction between modalities, enabling each modality to learn complementary features from other modalities. Meanwhile, the inter-modal feature pyramid transformer can also learn distance independent dependencies between modalities, which are not sensitive to misaligned images. Furthermore, a local attention mechanism is introduced within different windows into MFPT to achieve efficient correlation between regions of different scales or different modalities. Experimental results on two RGB-Infrared detection datasets demonstrate the proposed method is superior to state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI