计算机视觉
人工智能
计算机科学
目标检测
对象(语法)
分割
作者
Shangjie Li,Keke Geng,Guodong Yin,Ziwei Wang,Min Qian
出处
期刊:IEEE Transactions on Industrial Informatics
[Institute of Electrical and Electronics Engineers]
日期:2023-04-07
卷期号:20 (1): 845-853
被引量:9
标识
DOI:10.1109/tii.2023.3263274
摘要
Object detection in 3-D space is a fundamental technology in the autonomous driving system. Among the published 3-D object detection methods, the single-modal methods based on point clouds have been widely studied. One problem exposed by these methods is that point clouds lack color and texture features. The limitation in conveying semantic information often leads to failures in detection. In contrast, the multimodal methods based on the image and point clouds fusion may solve this problem, but relevant research is not sufficient. In this work, a single-stage multiview multimodal 3-D object detector (MVMM) is proposed, which can naturally and efficiently extract semantic and geometric information from the image and point clouds. Specifically, the data-level fusion approach of point clouds coloring is used for combining information from the camera and LIDAR. Next, an encoder–decoder backbone is devised to extract features from colored points in the range view. Then, colored points are concatenated with the range view features, voxelized, and fed into the point view bridge for down-sampling. Finally, the down-sampled feature map is used by the bird's eye view backbone and the detection head for generating 3-D results based on predefined anchors. According to extensive experiments on the KITTI dataset, MVMM achieves competitive performance while runs at 27 FPS on the 1080 Ti GPU. Particularly, MVMM performs extremely well in difficult scenes (e.g., heavy occlusion and truncation) due to the understanding of fused information.
科研通智能强力驱动
Strongly Powered by AbleSci AI