计算机科学
杠杆(统计)
人工智能
语义学(计算机科学)
生成语法
级联
弹丸
自然语言处理
隐藏物
程序设计语言
工程类
化学
有机化学
化学工程
操作系统
作者
Renrui Zhang,Xiangfei Hu,Bohao Li,Siyuan Huang,Hanqiu Deng,Hongsheng Li,Yu Qiao,Peng Gao
出处
期刊:Cornell University - arXiv
日期:2023-03-03
标识
DOI:10.48550/arxiv.2303.02151
摘要
Visual recognition in low-data regimes requires deep neural networks to learn generalized representations from limited training samples. Recently, CLIP-based methods have shown promising few-shot performance benefited from the contrastive language-image pre-training. We then question, if the more diverse pre-training knowledge can be cascaded to further assist few-shot representation learning. In this paper, we propose CaFo, a Cascade of Foundation models that incorporates diverse prior knowledge of various pre-training paradigms for better few-shot learning. Our CaFo incorporates CLIP's language-contrastive knowledge, DINO's vision-contrastive knowledge, DALL-E's vision-generative knowledge, and GPT-3's language-generative knowledge. Specifically, CaFo works by 'Prompt, Generate, then Cache'. Firstly, we leverage GPT-3 to produce textual inputs for prompting CLIP with rich downstream linguistic semantics. Then, we generate synthetic images via DALL-E to expand the few-shot training data without any manpower. At last, we introduce a learnable cache model to adaptively blend the predictions from CLIP and DINO. By such collaboration, CaFo can fully unleash the potential of different pre-training methods and unify them to perform state-of-the-art for few-shot classification. Code is available at https://github.com/ZrrSkywalker/CaFo.
科研通智能强力驱动
Strongly Powered by AbleSci AI