先验概率
计算机科学
人工智能
杠杆(统计)
图像(数学)
嵌入
编码器
机器学习
自然语言处理
贝叶斯概率
操作系统
作者
Tingting Qiao,Jing Zhang,Duanqing Xu,Dacheng Tao
出处
期刊:Neural Information Processing Systems
日期:2019-01-01
卷期号:32: 885-895
被引量:67
摘要
Text-to-image generation, i.e. generating an image given a text description, is a very challenging task due to the significant semantic gap between the two domains. Humans, however, tackle this problem intelligently. We learn from diverse objects to form a solid prior about semantics, textures, colors, shapes, and layouts. Given a text description, we immediately imagine an overall visual impression using this prior and, based on this, we draw a picture by progressively adding more and more details. In this paper, and inspired by this process, we propose a novel text-to-image method called LeicaGAN to combine the above three phases in a unified framework. First, we formulate the multiple priors learning phase as a textual-visual co-embedding (TVE) comprising a text-image encoder for learning semantic, texture, and color priors and a text-mask encoder for learning shape and layout priors. Then, we formulate the imagination phase as multiple priors aggregation (MPA) by combining these complementary priors and adding noise for diversity. Lastly, we formulate the creation phase by using a cascaded attentive generator (CAG) to progressively draw a picture from coarse to fine. We leverage adversarial learning for LeicaGAN to enforce semantic consistency and visual realism. Thorough experiments on two public benchmark datasets demonstrate LeicaGAN's superiority over the baseline method. Code has been made available at https://github.com/qiaott/LeicaGAN.
科研通智能强力驱动
Strongly Powered by AbleSci AI