多光谱图像
模态(人机交互)
行人检测
计算机视觉
遥感
行人
人工智能
计算机科学
环境科学
材料科学
地理
工程类
运输工程
作者
Yanhao Liu,Chuan Hu,Baixuan Zhao,Yonghui Huang,Xi Zhang
标识
DOI:10.1109/tiv.2024.3367688
摘要
Multispectral pedestrian detection based on RGB-thermal (RGB-T) camera has been actively studied in autonomous driving in recent years as its robustness under complex traffic scenes. However, the fusion of multispectral data poses several challenges. Firstly, the fusion method requires dynamic adjustment of fusion weights considering environmental influences, such as illumination and temperature. Secondly, effective feature fusion necessitates addressing slight misalignment of visual sensors and enhancement of inconspicuous target's feature in traffic scenes. To solve problems above, we propose a novel network with three effective modules. In contrast to previous global fusion weight methods, the region-based illumination and temperature aware (RITA) module is proposed as dual pipeline structure to generate 5 regional fusion weights, which contains global and regional environmental information comprehensively. Additionally, compared to previous one-stage fusion strategies, a two-stage refined modality fusion is proposed by two modules. The spatial-aligned modal fusion (SAMF) module generates fusion features with large-scale spatial attention masks, which can enhance corresponding features and alleviate the slight misalignment between different modalities. The object-correlated cross-modality enhancement (OCE) module is proposed to complement effective features to fusion modality, which establishes inter-pedestrian relationships and enhance features of inconspicuous pedestrians. Experimental results of average miss rate on two challenging multispectral pedestrian datasets KAIST and CVC-14 achieve 7.64% and 21.3% respectively, and outperform competitive BAANet by 10.35% in miss rate of distant pedestrians in KAIST, demonstrating the advantages of our proposed method compared with state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI