计算机科学
杠杆(统计)
语言模型
学习迁移
任务(项目管理)
水准点(测量)
人工智能
训练集
传输(计算)
弹丸
钥匙(锁)
编码(集合论)
自然语言处理
机器学习
程序设计语言
并行计算
化学
计算机安全
管理
大地测量学
有机化学
集合(抽象数据类型)
经济
地理
作者
Vishaal Udandarao,Ankush Gupta,Samuel Albanie
标识
DOI:10.1109/iccv51070.2023.00257
摘要
Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models. CLIP demonstrates impressive zero-shot classification and retrieval performance on diverse downstream tasks. However, to leverage its full potential, fine-tuning still appears to be necessary. Fine-tuning the entire CLIP model can be resource-intensive and unstable. Moreover, recent methods that aim to circumvent this need for fine-tuning still require access to images from the target task distribution. In this paper, we pursue a different approach and explore the regime of training-free "name-only transfer" in which the only knowledge we possess about the downstream task comprises the names of downstream target categories. We propose a novel method, SuS-X, consisting of two key building blocks— "SuS" and "TIP-X", that requires neither intensive fine-tuning nor costly labelled data. SuS-X achieves state-of-the-art (SoTA) zero-shot classification results on 19 benchmark datasets. We further show the utility of TIP-X in the training-free few-shot setting, where we again achieve SoTA results over strong training-free baselines. Code is available at https://github.com/vishaal27/SuS-X.
科研通智能强力驱动
Strongly Powered by AbleSci AI