修补
计算机科学
人工智能
语义学(计算机科学)
图像(数学)
鉴别器
光学(聚焦)
编码器
对象(语法)
像素
自然语言处理
计算机视觉
模式识别(心理学)
判决
词(群论)
数学
操作系统
探测器
光学
物理
电信
程序设计语言
几何学
作者
Xingcai Wu,Kejun Zhao,Qianding Huang,Qi Wang,Zhenguo Yang,Ge‐Fei Hao
标识
DOI:10.1016/j.patcog.2023.109961
摘要
Text-guided image inpainting aims to generate corrupted image patches and obtain a plausible image based on textual descriptions, considering the relationship between textual and visual semantics. Existing works focus on predicting missing patches from the residual pixels of corrupted images, ignoring the visual semantics of the objects of interest in the images corresponding to the textual descriptions. In this paper, we propose a text-guided image inpainting method with multi-grained image-text semantic learning (MISL), consisting of global-local generators and discriminators. More specifically, we devise hierarchical learning (HL) with global-coarse-grained, object-fine-grained, and global-fine-grained learning stages in the global-local generators to refine the corrupted images from the global to local. In particular, the object-fine-grained learning stage focuses on the visual semantics of objects of interest in corrupted images by using an encoder-decoder network with self-attention blocks. Not only that, we design a mask reconstruction (MR) module to further act on the restoration of the objects of interest corresponding to the given textual descriptions. To inject the textual semantics into the global-local generators, we implement a multi-attention (MA) module that incorporates the word-level and sentence-level textual features to generate three different-grained images. For training, we exploit a global discriminator and a flexible discriminator to penalize the whole image and the corrupted region, respectively. Extensive experiments conducted on four datasets show the outperformance of the proposed MISL.
科研通智能强力驱动
Strongly Powered by AbleSci AI