计算机科学
人工智能
融合
计算机视觉
变压器
工程类
哲学
语言学
电压
电气工程
作者
Yabin Zhu,Chenglong Li,Xiao Wang,Jin Tang,Zhixiang Huang
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:: 1-1
标识
DOI:10.1109/tcsvt.2024.3391802
摘要
Existing Transformer-based RGB-Thermal (RGBT) tracking methods either use cross-attention to fuse the two modalities, or use self-attention and cross-attention to model both modality-specific and modality-sharing information. However, the significant appearance gap between modalities limits the feature representation ability of certain modalities during the fusion process. To address this problem, we propose a novel Progressive Fusion Transformer called ProFormer, which progressively integrates single-modality information into the multimodal representation for robust RGBT tracking. In particular, ProFormer first uses a self-attention module to collaboratively extract the multimodal representation. Then, ProFormer introduces two cross-attention modules to interact it with the features of the dual modalities for enhancing modality-specific information in the multimodal representation. In addition, we propose a dynamically guided learning algorithm that adaptively employs the well-performing branches to guide the learning of other branches, to improve the representation ability of each branch. Extensive experiments demonstrate that our proposed ProFormer achieves a new state-of-the-art performance on RGBT210, RGBT234, LasHeR, and VTUAV datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI