计算机科学
基数(数据建模)
修剪
聚类分析
散列函数
人工神经网络
量化(信号处理)
算法
数据挖掘
理论计算机科学
人工智能
农学
计算机安全
生物
作者
Bolong Zheng,Ziyang Yue,Qi Hu,Xiaomeng Yi,Xiaofan Luan,Charles Xie,Xiaofang Zhou,Christian S. Jensen
标识
DOI:10.1109/icde55515.2023.00246
摘要
Approximate nearest neighbor (ANN) search in high-dimensional space plays an essential role in a variety of real-world applications. A well-known solution to ANN search, inverted file product quantization (IVFPQ) adopts inverted files to avoid exhaustive examination and compresses vectors using product quantization to reduce the space overhead. However, existing implementations use the same fixed probing cardinality (i.e., the number of cells to probe) setting for all queries, which leads to too many or too few cell examinations, thus increasing the average query latency or reducing the recall. To achieve a better trade-off between latency and accuracy, we enable probing cardinality estimation for high-dimensional ANN search by using deep learning techniques. We develop HBK-means, a hierarchical balanced clustering algorithm that reduces the data distribution imbalance of cells to enable a better estimation. Next, we develop PCE-Net, an encoder-decoder based neural network for estimating query-dependent minimum probing cardinality. In addition, we introduce two query optimization strategies: lower bound sorting based pruning (LBS-Pruning) and early termination (ET), to further reduce query latency. Extensive experiments with real-world data offer evidence that the proposed solution is capable of achieving better performance than IVFPQ and its variants.
科研通智能强力驱动
Strongly Powered by AbleSci AI