计算机科学
个性化学习
人机交互
人工智能
数学教育
心理学
教学方法
开放式学习
合作学习
作者
Tao Guo,Song Guo,Junxiao Wang
标识
DOI:10.1145/3543507.3583518
摘要
Pre-trained vision-language models like CLIP show great potential in learning representations that capture latent characteristics of users. A recently proposed method called Contextual Optimization (CoOp) introduces the concept of training prompt for adapting pre-trained vision-language models. Given the lightweight nature of this method, researchers have migrated the paradigm from centralized to decentralized system to innovate the collaborative training framework of Federated Learning (FL). However, current prompt training in FL mainly focuses on modeling user consensus and lacks the adaptation to user characteristics, leaving the personalization of prompt largely under-explored. Researches over the past few years have applied personalized FL (pFL) approaches to customizing models for heterogeneous users. Unfortunately, we find that with the variation of modality and training behavior, directly applying the pFL methods to prompt training leads to insufficient personalization and performance. To bridge the gap, we present pFedPrompt, which leverages the unique advantage of multimodality in vision-language models by learning user consensus from linguistic space and adapting to user characteristics in visual space in a non-parametric manner. Through this dual collaboration, the learned prompt will be fully personalized and aligned to the user's local characteristics. We conduct extensive experiments across various datasets under the FL setting with statistical heterogeneity. The results demonstrate the superiority of our pFedPrompt against the alternative approaches with robust performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI