局部敏感散列
散列函数
动态完美哈希
计算机科学
通用哈希
特征哈希
过度拟合
模式识别(心理学)
人工智能
与K无关的哈希
最近邻搜索
线性哈希
成对比较
公制(单位)
哈希表
双重哈希
机器学习
经济
计算机安全
运营管理
人工神经网络
作者
Jun Wang,Sanjiv Kumar,Shih‐Fu Chang
标识
DOI:10.1109/tpami.2012.48
摘要
Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e.g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions based on random or principal projections. The resulting hashes are either not very accurate or are inefficient. Moreover, these methods are designed for a given metric similarity. On the contrary, semantic similarity is usually given in terms of pairwise labels of samples. There exist supervised hashing methods that can handle such semantic similarity, but they are prone to overfitting when labeled data are small or noisy. In this work, we propose a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets. Based on this framework, we present three different semi-supervised hashing methods, including orthogonal hashing, nonorthogonal hashing, and sequential hashing. Particularly, the sequential hashing method generates robust codes in which each hash function is designed to correct the errors made by the previous ones. We further show that the sequential learning paradigm can be extended to unsupervised domains where no labeled pairs are available. Extensive experiments on four large datasets (up to 80 million samples) demonstrate the superior performance of the proposed SSH methods over state-of-the-art supervised and unsupervised hashing techniques.
科研通智能强力驱动
Strongly Powered by AbleSci AI