Multimodal PointPillars for Efficient Object Detection in Autonomous Vehicles
计算机科学
计算机视觉
人工智能
对象(语法)
作者
M. F. Oliveira,Ricardo Cerqueira,João Ribeiro Pinto,Joaquim Fonseca,Luís F. Teixeira
出处
期刊:IEEE transactions on intelligent vehicles [Institute of Electrical and Electronics Engineers] 日期:2024-01-01卷期号:: 1-11被引量:2
标识
DOI:10.1109/tiv.2024.3409409
摘要
Autonomous Vehicles aim to understand their surrounding environment by detecting relevant objects in the scene, which can be performed using a combination of sensors. The accurate prediction of pedestrians is a particularly challenging task, since the existing algorithms have more difficulty detecting small objects. This work studies and addresses this often overlooked problem by proposing Multimodal PointPillars (M-PP), a fast and effective novel fusion architecture for 3D object detection. Inspired by both MVX-Net and PointPillars, image features from a 2D CNN-based feature map are fused with the 3D point cloud in an early fusion architecture. By changing the heavy 3D convolutions of MVX-Net to a set of convolutional layers in 2D space, along with combining LiDAR and image information at an early stage, M-PP considerably improves inference time over the baseline, running at 28.49 Hz. It achieves inference speeds suitable for real-world applications while keeping the high performance of multimodal approaches. Extensive experiments show that our proposed architecture outperforms both MVX-Net and PointPillars for the pedestrian class in the KITTI 3D object detection dataset, with 62.78% in $AP_{BEV}$ (moderate difficulty), while also outperforming MVX-Net in the nuScenes dataset. Moreover, experiments were conducted to measure the detection performance based on object distance. The performance of M-PP surpassed other methods in pedestrian detection at any distance, particularly for faraway objects (more than 30 meters). Qualitative analysis shows that M-PP visibly outperformed MVX-Net for pedestrians and cyclists, while simultaneously making accurate predictions of cars.