Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image–Text Retrieval

计算机科学关系（数据库）图形人工智能特征（语言学）情报检索特征学习编码器光学（聚焦）模式识别（心理学）自然语言处理数据挖掘理论计算机科学物理光学操作系统语言学哲学

作者

Shu‐Juan Peng,Yi He,Xin Liu,Yiu‐ming Cheung,Xing Xu,Zhen Cui

出处

期刊：IEEE transactions on neural networks and learning systems [Institute of Electrical and Electronics Engineers]
日期：2022-07-13 卷期号：35 (2): 2194-2207 被引量：13

链接

nih.govdoi.org

标识

DOI：10.1109/tnnls.2022.3188569

摘要

Fine-grained image-text retrieval has been a hot research topic to bridge the vision and languages, and its main challenge is how to learn the semantic correspondence across different modalities. The existing methods mainly focus on learning the global semantic correspondence or intramodal relation correspondence in separate data representations, but which rarely consider the intermodal relation that interactively provide complementary hints for fine-grained semantic correlation learning. To address this issue, we propose a relation-aggregated cross-graph (RACG) model to explicitly learn the fine-grained semantic correspondence by aggregating both intramodal and intermodal relations, which can be well utilized to guide the feature correspondence learning process. More specifically, we first build semantic-embedded graph to explore both fine-grained objects and their relations of different media types, which aim not only to characterize the object appearance in each modality, but also to capture the intrinsic relation information to differentiate intramodal discrepancies. Then, a cross-graph relation encoder is newly designed to explore the intermodal relation across different modalities, which can mutually boost the cross-modal correlations to learn more precise intermodal dependencies. Besides, the feature reconstruction module and multihead similarity alignment are efficiently leveraged to optimize the node-level semantic correspondence, whereby the relation-aggregated cross-modal embeddings between image and text are discriminatively obtained to benefit various image-text retrieval tasks with high retrieval performance. Extensive experiments evaluated on benchmark datasets quantitatively and qualitatively verify the advantages of the proposed framework for fine-grained image-text retrieval and show its competitive performance with the state of the arts.

求助该文献

最长约 10秒，即可获得该文献文件

Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image–Text Retrieval

今日热心研友