计算机科学
人工智能
特征跟踪
计算机视觉
特征(语言学)
变压器
特征提取
模式识别(心理学)
电压
工程类
哲学
语言学
电气工程
作者
Baozhen Sun,Zhenhua Wang,Shilei Wang,Yongkang Cheng,Jifeng Ning
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology
[Institute of Electrical and Electronics Engineers]
日期:2024-03-13
卷期号:34 (8): 7259-7271
标识
DOI:10.1109/tcsvt.2024.3376690
摘要
Empowered by the sophisticated long-range dependency modeling ability of Transformer, tracking performance has seen a dynamic increase in recent years. Approaches in this vein leverage the Transformer feature to integrate the information of target and search regions while neglecting the superior local representation extracted by their CNN backbone. To address this, we introduce a BIdirectional inTeraction mechanism between CNN and Transformer features for visual tracking, termed BIT-Tracker, which admits a comprehensive fusion of local and global representations, and thus boosts tracking performance. The first ingredient of BIT-Tracker is an aggregation of multi-level Transformer features to achieve a better global modeling ability. In order to combine the merits of both local and global representations, our second ingredient performs a bi-directional interaction between CNN and Transformer features, where the interaction is achieved via either querying the CNN feature from the Transformer feature or querying the Transformer feature from the CNN feature. Afterwards, the outputs from both directions are fused to predict the temporal locations of targets. Extensive experiments demonstrate the effectiveness of the proposed feature aggregation and bi-directional interaction modules. Impressively, BIT-Tracker achieves leading performance on eight tracking benchmarks and outperforms SOTA results by salient margins. Code will be made available.
科研通智能强力驱动
Strongly Powered by AbleSci AI