Enhancing visual tracking with a unified temporal Transformer framework
计算机科学
变压器
计算机视觉
跟踪(教育)
人工智能
工程类
电气工程
心理学
教育学
电压
作者
Tianlu Zhang,Ziniu Jin,Kurt Debattista,Qiang Zhang,Jungong Han
出处
期刊:IEEE transactions on intelligent vehicles [Institute of Electrical and Electronics Engineers] 日期:2024-01-01卷期号:: 1-15被引量:1
标识
DOI:10.1109/tiv.2024.3398405
摘要
Visual object tracking is an essential research topic in computer vision with numerous practical applications including visual surveillance systems, autonomous vehicles and intelligent transportation systems. It involves tackling various challenges such as motion blur, occlusion and distractors, which require trackers to leverage temporal information, including temporal appearance information, temporal trajectory information and temporal context information. However, existing trackers usually focus on employing one certain temporal information while neglecting the complementarity of different types of temporal information. Additionally, cross-frame correlations that enable the transfer of diverse temporal information during tracking are under-explored. In this work, we propose a Unified Temporal Transformer Framework (UTTF) for robust visual tracking. Our framework effectively establishes multi-scale cross-frame relationships within historical frameworks and exploits the complementary information among three typical temporal information sources. Specifically, a Pyramid Spatial-Temporal Transformer Encoder (PSTTE) is designed to mutually reinforce historical features by establishing sound multi-scale associations (i.e., token-level, semantic-level and frame-level). Furthermore, an Adaptive Fusion Transformer Decoder (AFTD) is proposed to adaptively aggregate informative temporal cues from historical frames to enhance features of the current frame. Moreover, the proposed UTTF network can be easily extended to various tracking frameworks. Our experiments on seven prevalent visual object tracking benchmarks demonstrate that our proposed trackers outperform existing ones, establishing new state-of-the-art results.