计算机科学
对抗制
生成语法
融合
人工智能
生成对抗网络
图像融合
图像(数学)
模式识别(心理学)
计算机视觉
理论计算机科学
语言学
哲学
作者
Bing Yang,Xueqin Xiang,Wanzeng Kong,Jianhai Zhang,Yong Peng
标识
DOI:10.1109/tmm.2024.3358086
摘要
Text-to-image synthesis aims to generate highquality realistic images conditioned on text description. The great challenge of this task depends on deeply and seamlessly integrating image and text information. Thus, in this paper, we propose a deep multimodal fusion generative adversarial networks (DMF-GAN) that allows effective semantic interactions for finegrained text-to-image generation. Specifically, through a novel recurrent semantic fusion network, DMF-GAN could consistently manipulate global assignment of text information among isolated fusion blocks. With the assistance of a multi-head attention module, DMF-GAN could model word information from different perspectives and further improve the semantic consistency. In addition, a word-level discriminator is proposed to provide the generator with fine-grained feedback related to each word. Compared with current state-of-the-art methods, our proposed DMFGAN could efficiently synthesize realistic and text-alignment images and achieve better performance on challenging benchmarks. The code link: https://github.com/xueqinxiang/DMF-GAN
科研通智能强力驱动
Strongly Powered by AbleSci AI