激光雷达
计算机视觉
人工智能
计算机科学
传感器融合
点云
探测器
目标检测
稳健性(进化)
特征提取
遥感
模式识别(心理学)
电信
生物化学
化学
基因
地质学
作者
Jian Wang,Fan Li,Yi An,Xuchong Zhang,Hongbin Sun
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology
[Institute of Electrical and Electronics Engineers]
日期:2024-02-16
卷期号:34 (7): 5753-5764
被引量:56
标识
DOI:10.1109/tcsvt.2024.3366664
摘要
LiDAR and camera are two critical sensors that can provide complementary information for accurate 3D object detection. Most works are devoted to improving the detection performance of fusion models on the clean and well-collected datasets. However, the collected point clouds and images in real scenarios may be corrupted to various degrees due to potential sensor malfunctions, which greatly affects the robustness of the fusion model and poses a threat to safe deployment. In this paper, we first analyze the shortcomings of most fusion detectors, which rely mainly on the LiDAR branch, and the potential of the bird's eye-view (BEV) paradigm in dealing with partial sensor failures. Based on that, we present a robust LiDAR-camera fusion pipeline in unified BEV space with two novel designs under four typical LiDAR-camera malfunction cases. Specifically, a mutual deformable attention is proposed to dynamically model the spatial feature relationship and reduce the interference caused by the corrupted modality, and a temporal aggregation module is devised to fully utilize the rich information in the temporal domain. Together with the decoupled feature extraction for each modality and holistic BEV space fusion, the proposed detector, termed RobBEV, can work stably regardless of single-modality data corruption. Extensive experiments on the large-scale nuScenes dataset under robust settings demonstrate the effectiveness of our approach.
科研通智能强力驱动
Strongly Powered by AbleSci AI