安全性令牌
师(数学)
计算机科学
情态动词
计算机视觉
RGB颜色模型
人工智能
跟踪(教育)
算术
数学
计算机网络
心理学
教育学
化学
高分子化学
作者
Yujue Cai,Xiubao Sui,Guohua Gu,Qian Chen
标识
DOI:10.1016/j.patcog.2024.110626
摘要
RGB-T tracking takes visible and infrared images as inputs, which is an extended application of multi-modal fusion in the field of visual object tracking. The complementarity between visible and infrared modalities can enhance the robustness of tracker in complex scenes. Cross-modal interaction can facilitate the fusion and synergy of different modalities, but most previous methods lack clear target information in multi-modal fusion, leading to some undesired cross-relation in interaction. To reduce these undesired cross-relations, we propose a Multi-modal Interaction scheme Guided by Token Division strategy (MIGTD). This scheme divides the input multi-modal tokens into several categories and restricts the interaction between tokens by setting different rules. The above operation is implemented in parallel through an attention masking strategy. To accurately classify search tokens, an instance segmentation task with box-supervised loss is employed. We conduct extensive experiments on three popular benchmark datasets, RGBT234, LasHeR and VTUAV. The experimental results indicate that the tracker proposed in this article reach the world's advanced level in performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI