计算机科学
图像检索
人工智能
代表(政治)
推论
块(置换群论)
背景(考古学)
光学(聚焦)
图形
模式识别(心理学)
匹配(统计)
图像(数学)
视觉文字
情报检索
自然语言处理
理论计算机科学
数学
古生物学
政治
政治学
法学
生物
统计
物理
几何学
光学
作者
Song Yang,Qiang Li,Wenhui Li,Xuanya Li,An-An Liu
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology
[Institute of Electrical and Electronics Engineers]
日期:2022-11-01
卷期号:32 (11): 8037-8050
被引量:73
标识
DOI:10.1109/tcsvt.2022.3182426
摘要
Image-text retrieval is a fundamental and vital task in multi-media retrieval and has received growing attention since it connects heterogeneous data. Previous methods that perform well on image-text retrieval mainly focus on the interaction between image regions and text words. But these approaches lack joint exploration of characteristics and contexts of regions and words, which will cause semantic confusion of similar objects and loss of contextual understanding. To address these issues, a dual-level representation enhancement network (DREN) is proposed to strength the characteristic and contextual representations by innovative block-level and instance-level representation enhancement modules, respectively. The block-level module focuses on mining the potential relations between multiple blocks within each instance representation, while the instance-level module concentrates on learning the contextual relations between different instances. To facilitate the accurate matching of image-text pairs, we propose the graph correlation inference and weighted adaptive filtering to conduct the local and global matching between image-text pairs. Extensive experiments on two challenging datasets (i.e., Flickr30K and MSCOCO) verify the superiority of our method for image-text retrieval.
科研通智能强力驱动
Strongly Powered by AbleSci AI