计算机科学
自然语言
背景(考古学)
人工智能
布线(电子设计自动化)
变化(天文学)
自然语言处理
跟踪(教育)
自然语言用户界面
对象(语法)
计算机网络
古生物学
心理学
物理
生物
天体物理学
教育学
标识
DOI:10.1145/3474085.3475349
摘要
Tracking with Natural-Language Specification (TNL) is a joint topic of understanding the vision and natural language with a wide range of applications. In previous works, the communication between two heterogeneous features of vision and language is mainly through a simple dynamic convolution. However, the performance of prior works is capped by the difficulty of linguistic variation of natural language in modeling the dynamically changing target and its surroundings. In the meanwhile, natural language and vision are firstly fused and then utilized for tracking, which is hard to model the query-focused context. Query-focused should pay more attention to context modeling to promote the correlation between these two features. To address these issues, we propose a capsule-based network, referred to as CapsuleTNL, which performs regression tracking with natural language query. In the beginning, the visual and textual input is encoded with capsules, which can not only establish the relationship between entities but also the relationship between the parts of the entity itself. Then, we devise two interaction routing modules, which consist of visual-textual routing module to reduce the linguistic variation of input query and textual-visual routing module to precisely incorporate query-based visual cues simultaneously. To validate the potential of the proposed network for visual object tracking, we evaluate our method on two large tracking benchmarks. The experimental evaluation demonstrates the effectiveness of our capsule-based network.
科研通智能强力驱动
Strongly Powered by AbleSci AI