二进制代码
情态动词
通用哈希
计算机科学
汉明空间
二进制数
可微函数
散列函数
动态完美哈希
理论计算机科学
源代码
汉明距离
算法
模式识别(心理学)
哈希表
人工智能
数据挖掘
汉明码
双重哈希
数学
区块代码
解码方法
高分子化学
操作系统
数学分析
化学
计算机安全
算术
作者
Junfeng Tu,Xueliang Liu,Zongxiang Lin,Richang Hong,Meng Wang
标识
DOI:10.1145/3503161.3548187
摘要
Cross-modal hashing aims at projecting the cross modal content into a common Hamming space for efficient search. Most existing work first encodes the samples with a deep network and then binaries the encoded feature into hashing code. However, the relative location information in the image may be lost when an image is encoded by the convolutional network, which makes it challenging to model the relationship of different modalities. Moreover, it is NP-hard to optimize the model with the discrete sign binary function popularly used in existing solutions. To address these issues, we propose a differentiable cross-modal hashing method that utilizes the multimodal transformer as the backbone to capture the location information in an image when encoding the visual content. In addition, a novel differentiable cross-modal hashing method is proposed to generate the binary code by a selecting mechanism, which could be formulated as a continuous and easily optimized problem. We perform extensive experiments on several cross modal datasets and the results show that the proposed method outperforms many existing solutions.
科研通智能强力驱动
Strongly Powered by AbleSci AI