计算机科学
人工智能
点云
计算机视觉
编码器
对象(语法)
视频跟踪
相似性(几何)
跟踪(教育)
稀疏逼近
代表(政治)
模式识别(心理学)
图像(数学)
心理学
教育学
政治
政治学
操作系统
法学
作者
Yubo Cui,Jiayao Shan,Zuoxu Gu,Zhiheng Li,Zheng Fang
出处
期刊:IEEE robotics and automation letters
日期:2022-10-01
卷期号:7 (4): 11926-11933
被引量:6
标识
DOI:10.1109/lra.2022.3208687
摘要
3D single object tracking is a key task in 3D computer vision. However, the sparsity of point clouds makes it difficult to compute the similarity and locate the object, posing big challenges to the 3D tracker. Previous works tried to solve the problem and improved the tracking performance in some common scenarios, but they usually failed in some extreme sparse scenarios, such as for tracking objects at long distances or partially occluded. To address the above problems, in this letter, we propose a sparse-to-dense and transformer-based framework for 3D single object tracking. First, we transform the 3D sparse points into 3D pillars and then compress them into 2D bird's eye view (BEV) features to have a dense representation. Then, we propose an attention-based encoder to achieve global similarity computation between template and search branches, which could alleviate the influence of sparsity. Meanwhile, the encoder applies the attention on multi-scale features to compensate for the lack of information caused by the sparsity of point cloud and the single scale of features. Finally, we use set-prediction to track the object through a two-stage decoder which also utilizes attention. Extensive experiments show that our method achieves very promising results on the KITTI and NuScenes datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI