修补
安全性令牌
计算机科学
控制(管理)
图像(数学)
人工智能
计算机视觉
计算机安全
作者
Ruichen Wang,J. L. Zhang,Qingsong Xie,Chen Chen,H. Lu
出处
期刊:Cornell University - arXiv
日期:2024-12-02
标识
DOI:10.48550/arxiv.2412.01223
摘要
Recently, diffusion models have exhibited superior performance in the area of image inpainting. Inpainting methods based on diffusion models can usually generate realistic, high-quality image content for masked areas. However, due to the limitations of diffusion models, existing methods typically encounter problems in terms of semantic consistency between images and text, and the editing habits of users. To address these issues, we present PainterNet, a plugin that can be flexibly embedded into various diffusion models. To generate image content in the masked areas that highly aligns with the user input prompt, we proposed local prompt input, Attention Control Points (ACP), and Actual-Token Attention Loss (ATAL) to enhance the model's focus on local areas. Additionally, we redesigned the MASK generation algorithm in training and testing dataset to simulate the user's habit of applying MASK, and introduced a customized new training dataset, PainterData, and a benchmark dataset, PainterBench. Our extensive experimental analysis exhibits that PainterNet surpasses existing state-of-the-art models in key metrics including image quality and global/local text consistency.
科研通智能强力驱动
Strongly Powered by AbleSci AI