计算机科学
点云
人工智能
目标检测
体素
变压器
计算机视觉
级联
棱锥(几何)
模式识别(心理学)
电压
数学
色谱法
量子力学
几何学
物理
化学
作者
Xinglong Li,Xiaowei Zhang
标识
DOI:10.1007/978-981-99-8435-0_24
摘要
Recently, Transformers have been widely applied in 3-D object detection to model global contextual relationships in point cloud collections or for proposal refinement. However, the structural information in 3-D point clouds, especially to the distant and small objects is often incomplete, leading to difficulties in accurate detection using these methods. To address this issue, we propose a Cascaded Transformer based on Dynamic Voxel Pyramid (called CasFormer) for 3-D object detection from LiDAR point clouds. Specifically, we dynamically spread relevant features from the voxel pyramid based on the sparsity of each region of interest (RoI), capturing more rich semantic information for structurally incomplete objects. Furthermore, a cross-stage attention mechanism is employed to cascade the refined results of the Transformer in stage by stage, as well as to improve the training convergence of transformer. Extensive experiments demonstrate that our CasFormer achieves progressive performance in KITTI Dataset and Waymo Open Dataset. Compared to CT3D, our method outperforms it by 1.12% and 1.27% in the moderate and hard levels of car detection, respectively, on the KITTI online 3-D object detection leaderboard.
科研通智能强力驱动
Strongly Powered by AbleSci AI