计算机科学
配方
排名(信息检索)
情态动词
光学(聚焦)
成对比较
人工智能
利用
范围(计算机科学)
图像(数学)
词(群论)
图像检索
机器学习
情报检索
模式识别(心理学)
自然语言处理
数学
化学
食品科学
高分子化学
物理
几何学
计算机安全
光学
程序设计语言
作者
Da Cao,Jingjing Chu,Ningbo Zhu,Liqiang Nie
标识
DOI:10.1016/j.knosys.2019.105428
摘要
Cross-modal recipe retrieval refers to the problem of retrieving a food image from a list of image candidates given a textual recipe as the query, or the reverse side. However, existing cross-modal recipe retrieval approaches mostly focus on learning the representations of images and recipes independently and sewing them up by projecting them into a common space. Such methods overlook the interplay between images and recipes, resulting in the suboptimal retrieval performance. Toward this end, we study the problem of cross-modal recipe retrieval from the viewpoint of parallel- and cross-attention networks learning. Specifically, we first exploit a parallel-attention network to independently learn the attention weights of components in images and recipes. Thereafter, a cross-attention network is proposed to explicitly learn the interplay between images and recipes, which simultaneously considers word-guided image attention and image-guided word attention. Lastly, the learnt representations of images and recipes stemming from parallel- and cross-attention networks are elaborately connected and optimized using a pairwise ranking loss. By experimenting on two datasets, we demonstrate the effectiveness and rationality of our proposed solution on the scope of both overall performance comparison and micro-level analyses.
科研通智能强力驱动
Strongly Powered by AbleSci AI