加权
图像检索
公制(单位)
相似性(几何)
情态动词
计算机科学
多项式的
模式识别(心理学)
人工智能
功能(生物学)
机器学习
数学
图像(数学)
放射科
数学分析
生物
进化生物学
经济
医学
运营管理
化学
高分子化学
作者
Jiwei Wei,Yang Yang,Xing Xu,Xiaofeng Zhu,Heng Tao Shen
标识
DOI:10.1109/tpami.2021.3088863
摘要
Cross-modal retrieval has recently attracted growing attention, which aims to match instances captured from different modalities. The performance of cross-modal retrieval methods heavily relies on the capability of metric learning to mine and weight the informative pairs. While various metric learning methods have been developed for unimodal retrieval tasks, the cross-modal retrieval tasks, however, have not been explored to its fullest extent. In this paper, we develop a universal weighting metric learning framework for cross-modal retrieval, which can effectively sample informative pairs and assign proper weight values to them based on their similarity scores so that different pairs favor different penalty strength. Based on this framework, we introduce two types of polynomial loss for cross-modal retrieval, self-similarity polynomial loss and relative-similarity polynomial loss. The former provides a polynomial function to associate the weight values with self-similarity scores, and the latter defines a polynomial function to associate the weight values with relative-similarity scores. Both self and relative-similarity polynomial loss can be freely applied to off-the-shelf methods and further improve their retrieval performance. Extensive experiments on two image-text retrieval datasets, three video-text retrieval datasets and one fine-grained image retrieval dataset demonstrate that our proposed method can achieve a noticeable boost in retrieval performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI