计算机科学
自回归模型
图像(数学)
插值(计算机图形学)
生成模型
推论
建筑
生成语法
极限(数学)
缩放比例
比例(比率)
图像缩放
人工智能
算法
理论计算机科学
图像处理
数学
计量经济学
几何学
数学分析
物理
量子力学
艺术
视觉艺术
作者
Minguk Kang,Jun-Yan Zhu,Richard Zhang,Jaesik Park,Eli Shechtman,Sylvain Paris,Taesung Park
标识
DOI:10.1109/cvpr52729.2023.00976
摘要
The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL.E 2, autoregressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that naïvely increasing the capacity of the StyleGan architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel images in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.
科研通智能强力驱动
Strongly Powered by AbleSci AI