自编码
计算机科学
透视图(图形)
噪音(视频)
扩散
生成语法
生成模型
功能(生物学)
计算
可扩展性
人工智能
应用数学
马尔可夫链
人工神经网络
数学优化
理论计算机科学
算法
机器学习
数学
图像(数学)
物理
数据库
进化生物学
生物
热力学
出处
期刊:Cornell University - arXiv
日期:2022-01-01
被引量:69
标识
DOI:10.48550/arxiv.2208.11970
摘要
Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives. We first derive Variational Diffusion Models (VDM) as a special case of a Markovian Hierarchical Variational Autoencoder, where three key assumptions enable tractable computation and scalable optimization of the ELBO. We then prove that optimizing a VDM boils down to learning a neural network to predict one of three potential objectives: the original source input from any arbitrary noisification of it, the original source noise from any arbitrarily noisified input, or the score function of a noisified input at any arbitrary noise level. We then dive deeper into what it means to learn the score function, and connect the variational perspective of a diffusion model explicitly with the Score-based Generative Modeling perspective through Tweedie's Formula. Lastly, we cover how to learn a conditional distribution using diffusion models via guidance.
科研通智能强力驱动
Strongly Powered by AbleSci AI