Deep Semantic-Aware Proxy Hashing for Multi-Label Cross-Modal Retrieval

计算机科学 散列函数 人工智能 情态动词 情报检索 化学 计算机安全 高分子化学
作者
Yadong Huo,Kezhen Xie,Jiangyan Dai,Lei Wang,Wenfeng Zhang,Lei Huang,Chengduan Wang
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers]
卷期号:34 (1): 576-589 被引量:5
标识
DOI:10.1109/tcsvt.2023.3285266
摘要

Deep hashing has attracted broad interest in cross-modal retrieval because of its low cost and efficient retrieval benefits. To capture the semantic information of raw samples and alleviate the semantic gap, supervised cross-modal hashing methods that utilize label information which could map raw samples from different modalities into a unified common space, are proposed. Although making great progress, existing deep cross-modal hashing methods are suffering from some problems, such as: 1) considering multi-label cross-modal retrieval, proxy-based methods ignore the data-to-data relations and fail to explore the combination of the different categories profoundly, which could lead to some samples without common categories being embedded in the vicinity; 2) for feature representation, image feature extractors containing multiple convolutional layers cannot fully obtain global information of images, which results in the generation of sub-optimal binary hash codes. In this paper, by extending the proxy-based mechanism to multi-label cross-modal retrieval, we propose a novel Deep Semantic-aware Proxy Hashing (DSPH) framework, which could embed multi-modal multi-label data into a uniform discrete space and capture fine-grained semantic relations between raw samples. Specifically, by learning multi-modal multi-label proxy terms and multi-modal irrelevant terms jointly, the semantic-aware proxy loss is designed to capture multi-label correlations and preserve the correct fine-grained similarity ranking among samples, alleviating inter-modal semantic gaps. In addition, for feature representation, two transformer encoders are proposed as backbone networks for images and text, respectively, in which the image transformer encoder is introduced to obtain global information of the input image by modeling long-range visual dependencies. We have conducted extensive experiments on three baseline multi-label datasets, and the experimental results show that our DSPH framework achieves better performance than state-of-the-art cross-modal hashing methods. The code for the implementation of our DSPH framework is available at https://github.com/QinLab-WFU/DSPH .
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
思源应助北极星162采纳,获得10
刚刚
农宅小屋完成签到,获得积分20
1秒前
2秒前
3秒前
只爱吃肠粉完成签到,获得积分10
4秒前
5秒前
6秒前
脑洞疼应助氧硫硒锑铋采纳,获得10
7秒前
7秒前
hao123发布了新的文献求助10
7秒前
农宅小屋发布了新的文献求助10
8秒前
9秒前
视野胤发布了新的文献求助10
10秒前
学术卷心菜完成签到,获得积分10
12秒前
会画画的猫咪完成签到,获得积分10
13秒前
视野胤完成签到,获得积分20
13秒前
czy完成签到,获得积分10
13秒前
Ven23发布了新的文献求助10
16秒前
bkagyin应助An慧采纳,获得10
16秒前
Abdurrahman完成签到,获得积分10
16秒前
王同学2完成签到,获得积分10
18秒前
18秒前
楚小镇完成签到,获得积分20
20秒前
小李同学发布了新的文献求助30
20秒前
24秒前
Kishi完成签到,获得积分10
25秒前
iuhgnor发布了新的文献求助10
25秒前
开朗向真完成签到,获得积分10
26秒前
希望天下0贩的0应助Ven23采纳,获得10
27秒前
平淡尔琴完成签到,获得积分10
27秒前
28秒前
信江书院完成签到,获得积分10
29秒前
34秒前
简单画笔发布了新的文献求助10
34秒前
35秒前
OliverW完成签到,获得积分10
36秒前
37秒前
38秒前
耳朵儿歌完成签到 ,获得积分10
39秒前
40秒前
高分求助中
Medicina di laboratorio. Logica e patologia clinica 600
Sarcolestes leedsi Lydekker, an ankylosaurian dinosaur from the Middle Jurassic of England 500
《关于整治突出dupin问题的实施意见》(厅字〔2019〕52号) 500
Language injustice and social equity in EMI policies in China 500
mTOR signalling in RPGR-associated Retinitis Pigmentosa 500
A new species of Velataspis (Hemiptera Coccoidea Diaspididae) from tea in Assam 500
Geochemistry, 2nd Edition 地球化学经典教科书第二版 401
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3214612
求助须知:如何正确求助?哪些是违规求助? 2863231
关于积分的说明 8137661
捐赠科研通 2529429
什么是DOI,文献DOI怎么找? 1363668
科研通“疑难数据库(出版商)”最低求助积分说明 643903
邀请新用户注册赠送积分活动 616437