发布文献求助

Towards Fast and Accurate Image-Text Retrieval With Self-Supervised Fine-Grained Alignment

计算机科学图像检索人工智能情报检索图像（数学）图像自动标注模式识别（心理学）计算机视觉

作者

Jiamin Zhuang,Jing Yu,Yang Ding,Xiangyan Qu,Yue Hu

出处

期刊：IEEE Transactions on Multimedia [Institute of Electrical and Electronics Engineers]
日期：2023-05-29 卷期号：26: 1361-1372 被引量：4

链接

arxiv.org arxiv.org datacite.orgdoi.org

标识

DOI：10.1109/tmm.2023.3280734

摘要

Image-text retrieval requires the system to bridge the heterogenous gap between vision and language for accurate retrieval while keeping the network lightweight-enough for efficient retrieval. Existing trade-off solutions mainly study from the view of incorporating cross-modal interactions with the independent-embedding framework or leveraging stronger pre-trained encoders, which still demand time-consuming similarity measurement or heavyweight model structure in the retrieval stage. In this work, we propose an image-text alignment module SelfAlign on top of the independent-embedding framework, which improves the retrieval accuracy while maintains the retrieval efficiency without extra supervision. SelfAlign contains two collaborative sub-modules that force image-text alignment at both the concept level and context level by self-supervised contrastive learning. It doesn't require cross-modal embedding interactions during training while maintaining independent image and text encoders during retrieval. With comparable time cost, SelfAlign consistently boosts the accuracy of state-of-the-art non-pre-training independent-embedding models respectively by 9.1%, 4.2%, and 6.6% in terms of R@sum score on Flickr30 K, MS-COCO 1 K and MS-COCO 5 K datasets. The retrieval accuracy also outperforms most existing interactive-embedding models with orders of magnitude decrease in retrieval time. The source code is available at: https://github.com/Zjamie813/SelfAlign .

求助该文献

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

更新

新增更精细的自定义提醒设置 (2026-1-4)

新增

🕒每天60秒读懂世界·精选全球要闻 (2026-1-2)

更新

2025年影响因子查询已上线 (2025-6-18)

新增

PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: 麻雀完成签到，获得积分10

刚刚; 阳光新筠完成签到，获得积分10

刚刚; ZDY发布了新的文献求助10

刚刚; CodeCraft的应助被adoudoo采纳，获得10

1秒前; 脑洞疼的应助被碧蓝梦松采纳，获得10

2秒前; 希望天下0贩的0上传了应助文件

3秒前; 善学以致用的应助被Finn采纳，获得10

3秒前; gyh上传了应助文件

4秒前; 共享精神的应助被大头麦穗鱼采纳，获得30

5秒前; 隐形曼青的应助被会飞的猪采纳，获得10

6秒前; 可爱的函函的应助被学霸土豆采纳，获得10

6秒前; 无极微光上传了应助文件

7秒前; 快乐小瑶完成签到，获得积分10

7秒前; hxxx发布了新的文献求助10

7秒前; 有趣的桃上传了应助文件

8秒前; 迅速罡关闭了迅速罡的文献求助

9秒前; 大力的灵雁的应助被爱听歌笑寒采纳，获得30

9秒前; lh关闭了lh的文献求助

10秒前; csccscscs完成签到，获得积分10

10秒前; 老迟到的友容完成签到，获得积分10

11秒前; 可爱的函函上传了应助文件

11秒前; 大力的灵雁上传了应助文件

12秒前; 科研通AI6.3上传了应助文件

12秒前; 唐唐撤回了应助文件

12秒前; 共享精神上传了应助文件

12秒前; 田様上传了应助文件

12秒前; ZDY完成签到，获得积分10

12秒前; 思源上传了应助文件

13秒前; 111发布了新的文献求助10

13秒前; 小黄人的应助被谎言采纳，获得10

13秒前; 量子星尘发布了新的文献求助10

14秒前; LL驳回了顾矜的应助

14秒前; liao上传了应助文件

14秒前; 翼_发布了新的文献求助10

16秒前; luck发布了新的文献求助10

16秒前; 打打的应助被熊熊采纳，获得10

16秒前; 学霸土豆发布了新的文献求助10

16秒前; 研友_VZG7GZ的应助被酷酷问薇采纳，获得10

17秒前; Hello的应助被宣兰采纳，获得10

17秒前; Qxx发布了新的文献求助10

17秒前

高分求助中: (应助此贴封号)【重要！！请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000; Aerospace Standards Index - 2026 ASIN2026 3000; Polymorphism and polytypism in crystals 1000; Signals, Systems, and Signal Processing 610; Discrete-Time Signals and Systems 610; Research Methods for Business: A Skill Building Approach, 9th Edition 500; Social Work and Social Welfare: An Invitation（7th Edition） 410

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 6049489; 求助须知：如何正确求助？哪些是违规求助？ 7838522; 关于积分的说明 16263727; 捐赠科研通 5194997; 什么是DOI，文献DOI怎么找？ 2779718; 邀请新用户注册赠送积分活动 1762891; 关于科研通互助平台的介绍 1644888

今日热心研友

大力的灵雁

你嵙这个期刊没买

蓝莓橘子酱

贪玩的秋柔

独特奇异果

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2026 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：821889395【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通