适配器(计算)
计算机科学
人工智能
图像(数学)
计算机视觉
计算机硬件
作者
Ruirui Wei,Chunxiao Fan,Yuexin Wu
标识
DOI:10.1007/978-3-031-44210-0_18
摘要
Research on vision-language models has seen rapid development, enabling natural language-based processing for image generation and manipulation. Existing text-driven image manipulation is typically implemented by GAN inversion or fine-tuning diffusion models. The former is limited by the inversion capability of GANs, which fail to reconstruct pictures with novel poses and perspectives. The latter methods require expensive optimization for each input, and fine-tuning is still a complex process. To mitigate these problems, we propose a novel approach, dubbed Diffusion-Adapter, which performs text-driven image manipulation using frozen pre-trained diffusion models. In this work, we design an Adapter architecture to modify the target attributes without fine-tuning the pre-trained models. Our approach can be applied to diffusion models in any domain and only take a few examples to train the Adapter that could successfully edit images from unknown data. Compared with previous work, Diffusion-Adapter preserves a maximal amount of details from the original image without unintended changes to the input content. Extensive experiments demonstrate the advantages of our approach over competing baselines, and we make a novel attempt at text-driven image manipulation.
科研通智能强力驱动
Strongly Powered by AbleSci AI