计算机科学
人工智能
串联(数学)
眼动
编码器
模式识别(心理学)
语音识别
数学
组合数学
操作系统
作者
Long Gao,Langkun Chen,Pan Liu,Yan Jiang,Long Gao,Jifeng Ning
标识
DOI:10.1016/j.patcog.2023.109964
摘要
Transformer-based trackers have demonstrated promising performance in visual object tracking tasks. Nevertheless, two drawbacks limited the potential performance improvement of transformer-based trackers. Firstly, the static receptive field of the tokens within one attention layer of the original self-attention learning neglects the multi-scale nature in the object tracking task. Secondly, the learning procedure of the multi-layer perception (MLP) in the feed forward network (FFN) is lack of local interaction information among samples. To address the above issues, a new self-attention learning method, fine–coarse concatenated attention (FCA), is proposed to learn self-attention with fine and coarse granularity information. Moreover, the cross-concatenation MLP (CC-MLP) is developed to capture local interaction information across samples. Based on the two proposed modules, a novel encoder and decoder are constructed, and augmented in an all-attention tracking algorithm, FCAT. Comprehensive experiments on popular tracking datasets, OTB2015, LaSOT, GOT-10K and TrackingNet, reveal the effectiveness of FCA and CC-MLP, and FCAT achieves the state-of-art on the datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI