计算机科学
人工智能
鉴别器
基本事实
判决
语义学(计算机科学)
自然语言处理
保险丝(电气)
相似性(几何)
图像(数学)
模式识别(心理学)
电信
探测器
电气工程
程序设计语言
工程类
作者
Liang Zhao,Pingda Huang,Tengtuo Chen,Chunjiang Fu,Qinghao Hu,Yangqianhui Zhang
标识
DOI:10.1109/tmm.2023.3297769
摘要
Generating realistic images based on text descriptions remains challenging in computer vision. Existing multi-stage generation methods are sufficient to generate high-resolution images. However, these methods mainly use one sentence to synthesize images, which are difficult to extract adequate semantic features, resulting in the generated images being far apart from ground-truth images. In this paper, we propose a Multi-Sentence Complementary Generative Adversarial Network, MSCGAN, which assists in generating accurate images by fusing the same semantics from different sentences and preserving their unique semantics. More specifically, the latest BERT model is employed to identify semantic features and a multi-semantic fusion module (MSFM) is designed to fuse the semantic features of different sentences. Besides, a pre-trained cross-modal contrast similarity model (CCSM) is developed to explore fine-grained loss on generated images. Moreover, a multi-sentence joint discriminator is designed to ensure that the generated images match all sentences. Experiments and ablation studies on CUB and MS-COCO datasets demonstrate the significant superiority of the proposed method compared to state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI