计算机科学
利用
人工智能
图形
目标检测
场景图
上下文模型
机器学习
理论计算机科学
模式识别(心理学)
对象(语法)
计算机安全
渲染(计算机图形)
作者
Aijia Yang,Sihao Lin,Chung‐Hsing Yeh,Minglei Shu,Yi Yang,Xiaojun Chang
标识
DOI:10.1109/tmm.2023.3266897
摘要
The human visual system is capable of not only recognizing individual objects but also comprehending the contextual relationship between them in real-world scenarios, making it highly advantageous for object detection. However, in practical applications, such contextual information is often not available. Previous attempts to compensate for this by utilizing cross-modal data such as language and statistics to obtain contextual priors have been deemed sub-optimal due to a semantic gap. To overcome this challenge, we present a seamless integration of context into an object detector through Knowledge Distillation. Our approach intuitively represents context as a knowledge graph, describing the relative location and semantic relevance of different visual concepts. Leveraging recent advancements in graph representation learning with Transformer, we exploit the contextual information among objects using edge encoding and graph attention. Specifically, each image region propagates and aggregates the representation from its highly similar neighbors to form the knowledge graph in the Transformer encoder. Extensive experiments and a thorough ablation study conducted on challenging benchmarks MS-COCO, Pascal VOC and LVIS demonstrate the superiority of our method.
科研通智能强力驱动
Strongly Powered by AbleSci AI