期刊:IEEE Transactions on Cognitive and Developmental Systems [Institute of Electrical and Electronics Engineers] 日期:2023-01-19卷期号:15 (4): 2132-2143被引量:7
标识
DOI:10.1109/tcds.2023.3238181
摘要
Object detection is essential for an autonomous driving sensing system. Since the light condition is changed in unconstrained scenarios, the detection accuracy based on visible images can be greatly degraded. Although the detection accuracy can be improved by fusing visible and infrared images, existing multispectral object detection (MOD) algorithms suffer from inadequate intermodal interaction and a lack of global dependence in the fusion approach. Thus, we propose an MOD framework called YOLO-MS by designing a feature interaction and self-attention fusion network (FISAFN) as the backbone network. Within the FISAFN, correlations between two modalities are extracted by the feature interaction module (FIM) for reconstructing the information components of each modality and enhancing capability of information exchange. To filter redundant features and enhance complementary features, long-range information dependence between two modalities are established by using a self-attention feature fusion module (SAFFM). Thus, a better information richness of the fused features can be achieved. Experimental results on the FLIR-aligned data set and the M3FD data set demonstrate that the proposed YOLO-MS performs favorably against state-of-the-art approaches, including feature-level fusion and pixel-level fusion. And further, the proposed YOLO-MS possesses good detection performance under diverse scene conditions.