计算机科学
散列函数
人工智能
特征学习
相似性(几何)
无监督学习
机器学习
自然语言处理
计算机安全
图像(数学)
作者
Lina Sun,Yewen Li,Yumin Dong
标识
DOI:10.1145/3591106.3592242
摘要
Unsupervised cross-modal hashing (UCMH) has attracted increasing research due to its efficient retrieval performance and label irrelevance. However, existing methods have some bottlenecks: Firstly, the existing unsupervised methods suffer from inaccurate similarity measures due to the lack of correlation between features of different modalities and simple features cannot fully describe the fine-grained relationships of multi-modal data. Secondly, existing methods have rarely explored vision-language knowledge distillation schemes to distil multi-modal knowledge of these vision-language models to guide the learning of student networks. To address these bottlenecks, this paper proposes an effective unsupervised cross-modal hashing retrieval method, called Vision-Language Knowledge Distillation for Unsupervised Cross-Modal Hashing Retrieval (VLKD). VLKD uses the vision-language pre-training (VLP) model to encode features on multi-modal data, and then constructs a similarity matrix to provide soft similarity supervision for the student model. It distils the knowledge of the VLP model to the student model to gain an understanding of multi-modal knowledge. In addition, we designed an end-to-end unsupervised hashing learning model that incorporates a graph convolutional auxiliary network. The auxiliary network aggregates information from similar data nodes based on the similarity matrix distilled by the teacher model to generate more consistent hash codes. Finally, the teacher network does not require additional training, it only needs to guide the student network to learn high-quality hash representation, and VLKD is quite efficient in training and retrieval. Sufficient experiments on three multimedia retrieval benchmark datasets show that the proposed method achieves better retrieval performance compared to existing unsupervised cross-modal hashing methods, demonstrating the effectiveness of the proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI