计算机科学
散列函数
相似性(几何)
人工智能
生成模型
概率逻辑
语义相似性
自然语言处理
特征哈希
编码(集合论)
机器学习
模式识别(心理学)
生成语法
图像(数学)
哈希表
双重哈希
集合(抽象数据类型)
程序设计语言
计算机安全
作者
Zhenpeng Song,Qinliang Su,Jiayang Chen
标识
DOI:10.1145/3581783.3612596
摘要
By noticing the superior ability of contrastive learning in representation learning, several recent works have proposed to use it to learn semantic-rich hash codes. However, due to the absence of label information, existing contrastive-based hashing methods simply follow contrastive learning by only using the augmentation of the anchor as positive, while treating all other samples in the batch as negatives, resulting in the ignorance of a large number of potential positives. Consequently, the learned hash codes tend to be distributed dispersedly in the space, making their distances unable to accurately reflect their semantic similarities. To address this issue, we propose to exploit the similarity knowledge and hidden structure of the dataset. Specifically, we first develop an intuitive approach based on self-training that comprises two main components, a pseudo-label predictor and a hash code improving module, which mutually benefit from each other by utilizing the output from one another, in conjunction with the similarity knowledge obtained from pre-trained models. Furthermore, we subjected the intuitive approach to a more rigorous probabilistic framework and propose CGHash, a probabilistic hashing model based on conditional generative models, which is theoretically more reasonable and could model the similarity knowledge and the hidden group structure more accurately. Our extensive experimental results on three image datasets demonstrate that CGHash exhibits significant superiority when compared to both the proposed intuitive approach and existing baselines. Our code is available at https://github.com/KARLSZP/CGHash.
科研通智能强力驱动
Strongly Powered by AbleSci AI