人工智能
RGB颜色模型
计算机科学
计算机视觉
跟踪(教育)
BitTorrent跟踪器
眼动
骨干网
监督学习
人工神经网络
心理学
教育学
计算机网络
作者
Xuefeng Zhu,Tianyang Xu,Sara Atito,Muhammad Awais,Xiao‐Jun Wu,Zhenhua Feng,Josef Kittler
标识
DOI:10.1016/j.patcog.2024.110543
摘要
Recently, there has been a growing interest in RGB-D object tracking thanks to its promising performance achieved by combining visual information with auxiliary depth cues. However, the limited volume of annotated RGB-D tracking data for offline training has hindered the development of a dedicated end-to-end RGB-D tracker design. Consequently, the current state-of-the-art RGB-D trackers mainly rely on the visual branch to support the appearance modelling, with the depth map utilised for elementary information fusion or failure reasoning of online tracking. Despite the achieved progress, the current paradigms for RGB-D tracking have not fully harnessed the inherent potential of depth information, nor fully exploited the synergy of vision-depth information. Considering the availability of ample unlabelled RGB-D data and the advancement in self-supervised learning, we address the problem of self-supervised learning for RGB-D object tracking. Specifically, an RGB-D backbone network is trained on unlabelled RGB-D datasets using masked image modelling. To train the network, the masking mechanism creates a selective occlusion of the input visible image to force the corresponding aligned depth map to help with discerning and learning vision-depth cues for the reconstruction of the masked visible image. As a result, the pre-trained backbone network is capable of cooperating with crucial visual and depth features of the diverse objects and background in the RGB-D image. The intermediate RGB-D features output by the pre-trained network can effectively be used for object tracking. We thus embed the pre-trained RGB-D network into a transformer-based tracking framework for stable tracking. Comprehensive experiments and the analysis of the results obtained on several RGB-D tracking datasets demonstrate the effectiveness and superiority of the proposed RGB-D self-supervised learning framework and the following tracking approach.
科研通智能强力驱动
Strongly Powered by AbleSci AI