大方坯过滤器
散列函数
计算机科学
双重哈希
哈希表
哈希树
数据结构
理论计算机科学
滤波器(信号处理)
集合(抽象数据类型)
哈希链
算法
沙-2
数据挖掘
计算机安全
计算机视觉
程序设计语言
作者
Rongbiao Xie,Meng Li,Zheyu Miao,Rong Gu,He Huang,Haipeng Dai,Guihai Chen
标识
DOI:10.1109/icde51399.2021.00061
摘要
Bloom filter is a compact memory-efficient probabilistic data structure supporting membership testing, i.e., to check whether an element is in a given set. However, as Bloom filter maps each element with uniformly random hash functions, few flexibilities are provided even if the information of negative keys (elements are not in the set) are available. The problem gets worse when the misidentification of negative keys brings different costs. To address the above problems, we propose a new Hash Adaptive Bloom Filter (HABF) that supports the customization of hash functions for keys. The key idea of HABF is to customize the hash functions for positive keys (elements are in the set) to avoid negative keys with high cost, and pack customized hash functions into a lightweight data structure named HashExpressor. Then, given an element at query time, HABF follows a two-round pattern to check whether the element is in the set. Further, we theoretically analyze the performance of HABF and bound the expected false positive rate. We conduct extensive experiments on representative datasets, and the results show that HABF outperforms the standard Bloom filter and its cutting-edge variants on the whole in terms of accuracy, construction time, query time, and memory space consumption (Note that source codes are available in [1]).
科研通智能强力驱动
Strongly Powered by AbleSci AI