F-SCP: An automatic prompt generation method for specific classes based on visual language pre-training models

计算机科学班级（哲学）背景（考古学）滤波器（信号处理）人工智能领域（数学分析）语言模型机器学习变化（天文学）自然语言处理计算机视觉数学古生物学数学分析物理天体物理学生物

作者

Bo Han,Xiaoyan Jiang,Zhijun Fang,Hamido Fujita,Yongbin Gao

出处

期刊：Pattern Recognition [Elsevier BV]
日期：2024-03-01 卷期号：147: 110096-110096 被引量：1

标识

DOI：10.1016/j.patcog.2023.110096

摘要

The zero-shot classification performance of large-scale vision-language pre-training models (e.g., CLIP, BLIP and ALIGN) can be enhanced by incorporating a prompt (e.g., “a photo of a [CLASS]”) before the class words. Modifying the prompt slightly can have significant effect on the classification outcomes of these models. Thus, it is crucial to include an appropriate prompt tailored to the classes. However, manual prompt design is labor-intensive and necessitates domain-specific expertise. The CoOp (Context Optimization) converts hand-crafted prompt templates into learnable word vectors to automatically generate prompts, resulting in substantial improvements for CLIP. However, CoOp exhibited significant variation in classification performance across different classes. Although CoOp-CSC (Class-Specific Context) has a separate prompt for each class, only shows some advantages on fine-grained datasets. In this paper, we propose a novel automatic prompt generation method called F-SCP (Filter-based Specific Class Prompt), which distinguishes itself from the CoOp-UC (Unified Context) model and the CoOp-CSC model. Our approach focuses on prompt generation for low-accuracy classes and similar classes. We add the Filter and SCP modules to the prompt generation architecture. The Filter module selects the poorly classified classes, and then reproduce the prompts through the SCP (Specific Class Prompt) module to replace the prompts of specific classes. Experimental results on six multi-domain datasets shows the superiority of our approach over the state-of-the-art methods. Particularly, the improvement in accuracy for the specific classes mentioned above is significant. For instance, compared with CoOp-UC on the OxfordPets dataset, the low-accuracy classes, such as, Class21 and Class26, are improved by 18% and 12%, respectively.

求助该文献

最长约 10秒，即可获得该文献文件

F-SCP: An automatic prompt generation method for specific classes based on visual language pre-training models

今日热心研友