点云
计算机科学
人工智能
计算机视觉
帧(网络)
特征(语言学)
保险丝(电气)
相似性(几何)
跟踪(教育)
视频跟踪
点(几何)
编码(集合论)
对象(语法)
编码
参考坐标系
图像(数学)
数学
工程类
几何学
心理学
教育学
语言学
程序设计语言
化学
集合(抽象数据类型)
生物化学
哲学
电信
基因
电气工程
作者
Yubo Cui,Zhiheng Li,Zheng Fang
出处
期刊:IEEE robotics and automation letters
日期:2023-08-01
卷期号:8 (8): 4967-4974
被引量:1
标识
DOI:10.1109/lra.2023.3290524
摘要
3D single object tracking with point clouds is a critical task in 3D computer vision. Previous methods usually input the last two frames and use the predicted box to get the template point cloud in previous frame and the search area point cloud in the current frame respectively, then use similarity-based or motion-based methods to predict the current box. Although these methods achieved good tracking performance, they ignore the historical information of the target, which is important for tracking. In this letter, compared to inputting two frames of point clouds, we input multi-frame of point clouds to encode the spatio-temporal information of the target and learn the motion information of the target implicitly, which could build the correlations among different frames to track the target in the current frame efficiently. Meanwhile, rather than directly using the point feature for feature fusion, we first crop the point cloud features into many patches and then use sparse attention mechanism to encode the patch-level similarity and finally fuse the multi-frame features. Extensive experiments show that our method achieves competitive results on challenging large-scale benchmarks (62.6% in KITTI and 49.66% in NuScenes). The code will be open soon.
科研通智能强力驱动
Strongly Powered by AbleSci AI