计算机科学
生成语法
生成对抗网络
对抗制
人工智能
图像(数学)
模式识别(心理学)
理论计算机科学
作者
Junpeng Liu,Hengkang Bao
标识
DOI:10.1007/978-3-031-53311-2_22
摘要
Text-to-image synthesis has been a popular multimodal task in recent years, which faces two major challenges: the semantic consistency and the fine-grained information loss. Existing methods mostly adopt either a multi-stage stacked architecture or a single-stream model with several affine transformations as the fusion block. The former requires additional networks to ensure the semantic consistency between text and image, which is complex and results in poor generation quality. The latter simply extracts affine transformation from Conditional Batch Normalization (CBN), which can not match text features well. To address these issues, we propose an effective Conditional Adaptive Generative Adversarial Network. Our proposed method (i.e., CA-GAN) adopts a single-stream network architecture, consisting of a single generator/discriminator pair. To be specific, we propose: (1) a conditional adaptive instance normalization residual block which promotes the generator to synthesize high quality images containing semantic information; (2) an attention block that focuses on image-related channels and pixels. We conduct extensive experiments on CUB and COCO datasets, and the results show the superiority of the proposed CA-GAN in text-to-image synthesis tasks compared with previous methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI