The paper "Analyzing and Improving the Training Dynamics of Diffusion Models" https://arxiv.org/abs/2312.02696 by Tero Karras et al. addresses a fundamental challenge in the training of diffusion models, a powerful class of generative models currently dominating the field of image synthesis. This survey explores the paper's key contributions and how they relate to the broader landscape of modern research surrounding diffusion models, focusing on the training dynamics and the use of Exponential Moving Average (EMA) in particular.
Diffusion Models: A Primer
Diffusion models work by gradually adding noise to an image until it becomes pure random noise. The model then learns to reverse this process, starting with pure noise and gradually denoising it until it resembles a realistic image. This process is analogous to slowly diffusing ink in water. The core of the model is a network that learns to predict the noise added at each step of the forward process. This allows the model to reverse the process and generate new images.
The Challenge of Training Dynamics