计算机科学
人工智能
计算机视觉
图像翻译
服装
灵活性(工程)
图像(数学)
光学(聚焦)
统计
物理
数学
考古
光学
历史
作者
Shidong Cao,Wenhao Chai,Shengyu Hao,Yanting Zhang,Hangyue Chen,Gaoang Wang
标识
DOI:10.1109/tmm.2023.3318297
摘要
Image-based fashion design with AI techniques has attracted increasing attention in recent years. We focus on the reference-based fashion design task, where we aim to combine a reference appearance image and a clothing image to generate a new fashion clothing image. Although existing diffusion-based image translation methods have enabled flexible style transfer, it is often difficult to transfer the appearance of the image realistically during reverse diffusion. When the referenced appearance domain greatly differs from the source domain, it often leads to the collapse in the translation. To tackle this issue, we present a novel diffusion model-based unsupervised structureaware transfer method, namely DiffFashion. Our method is free of model tuning and structure-preserving and has high flexibility in transferring from images with large domain gaps. Specifically, based on the optimal transport properties, we keep a shared latent across the clothing image and reference appearance image to bridge the gap between the two domains in the denoising process, and the latent of the reference image is gradually adapted to the clothing domain. Simultaneously, the structure is transferred from the source clothing to the output fashion image with mixed guidance, including pre-trained Vision Transformer (ViT) guidance and a foreground mask guidance, to further preserve the structure and appearance semantics from source and reference images. Our experimental results show that the proposed method outperforms state-of-the-art baseline models, generating more realistic images in the fashion design task. Code and demo can be found at https://github.com/Rem105- 210/DiffFashion .
科研通智能强力驱动
Strongly Powered by AbleSci AI