多光谱图像
行人检测
计算机科学
地点
判别式
人工智能
特征(语言学)
模式识别(心理学)
像素
情态动词
计算机视觉
模态(人机交互)
行人
分割
地理
化学
考古
高分子化学
语言学
哲学
作者
Yanpeng Cao,Xing Luo,Jiangxin Yang,Yanlong Cao,Michael Ying Yang
标识
DOI:10.1016/j.inffus.2022.06.008
摘要
Multispectral pedestrian detection has received much attention in recent years due to its superiority in detecting targets under adverse lighting/weather conditions. In this paper, we aim to generate highly discriminative multi-modal features by aggregating the human-related clues based on all available samples presented in multispectral images. To this end, we present a novel multispectral pedestrian detector performing locality guided cross-modal feature aggregation and pixel-level detection fusion. Given a number of single bounding boxes covering pedestrians in both modalities, we deploy two segmentation sub-branches to predict the existence of pedestrians on visible and thermal channels. By referring to the important locality information in the reference modality, we perform locality guided cross-modal feature aggregation to learn highly discriminative human-related features in the complementary modality by exploring the clues of all available pedestrians. Moreover, we utilize the obtained spatial locality maps to provide prediction confidence scores in visible and thermal channels and conduct pixel-wise adaptive fusion of detection results in complementary modalities. Extensive experiments demonstrate the effectiveness of our proposed method, outperforming the current state-of-the-art detectors on both KAIST and CVC-14 multispectral pedestrian detection datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI