计算机科学
语义学(计算机科学)
构造(python库)
图像检索
灵活性(工程)
图像(数学)
水准点(测量)
情报检索
人工智能
数学
大地测量学
统计
程序设计语言
地理
作者
Song Yang,Qiang Li,Wenhui Li,Min Liu,Xuanya Li,An-An Liu
标识
DOI:10.1145/3581783.3613786
摘要
Image-text retrieval is a fundamental branch in cross-modal retrieval. The core is to explore the semantic correspondence to align relevant image-text pairs. Some existing methods rely on global semantics and co-occurrence frequency to design knowledge introduction patterns for consistent representations. However, they lack flexibility due to the limitations of fixed information and empirical feedback. To address these issues, we develop an External Knowledge Dynamic Modeling~(EKDM) architecture based on the filtering mechanism, which dynamically explores different knowledge towards varied image-text pairs. Specially, we first capture abundant concepts and relationships from external knowledge to construct visual and textual corpus sets. Then, we progressively explores concepts related to images and texts by dynamic global representations. To endow the model with the capability of relationship decision, we integrate the variable spatial locations between objects for association exploration. Since the filtering mechanism is conditioned on dynamic semantics and variable spatial locations, our model can dynamically model different knowledge for different image-text pairs. Extensive experimental results on two benchmark datasets demonstrate the effectiveness of our proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI