计算机科学
嵌入
自然语言处理
人工智能
短语
代表(政治)
动作(物理)
情报检索
图像(数学)
视觉文字
关系(数据库)
文字嵌入
图像检索
文字袋模型
桥接(联网)
对象(语法)
模式识别(心理学)
数据挖掘
物理
量子力学
政治
政治学
法学
计算机网络
作者
Jiangtong Li,Li Niu,Liqing Zhang
出处
期刊:Proceedings of the ... AAAI Conference on Artificial Intelligence
[Association for the Advancement of Artificial Intelligence (AAAI)]
日期:2022-06-28
卷期号:36 (2): 1323-1331
被引量:19
标识
DOI:10.1609/aaai.v36i2.20020
摘要
Image-text retrieval plays a central role in bridging vision and language, which aims to reduce the semantic discrepancy between images and texts. Most of existing works rely on refined words and objects representation through the data-oriented method to capture the word-object cooccurrence. Such approaches are prone to ignore the asymmetric action relation between images and texts, that is, the text has explicit action representation (i.e., verb phrase) while the image only contains implicit action information. In this paper, we propose Action-aware Memory-Enhanced embedding (AME) method for image-text retrieval, which aims to emphasize the action information when mapping the images and texts into a shared embedding space. Specifically, we integrate action prediction along with an action-aware memory bank to enrich the image and text features with action-similar text features. The effectiveness of our proposed AME method is verified by comprehensive experimental results on two benchmark datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI