A Tutorial on

Foundations of Diffusion Models for Visual Content Generation

PREMI 2025 Conference @ IIT-Delhi, 10-14 December, 2025

Abstract

Diffusion Models have rapidly emerged as a groundbreaking paradigm in generative AI, demonstrating unprecedented capabilities in synthesizing high-quality, diverse, and realistic visual content. This tutorial provides a comprehensive introduction to the foundational principles and practical applications of diffusion models, specifically tailored for researchers, practitioners, and students interested in visual content generation. We will begin by demystifying the core mathematical and probabilistic concepts underlying diffusion processes. The tutorial will delve into key architectural components alongside various sampling strategies. Furthermore, we will explore advanced topics including conditional generation, classifier-free guidance, and the role of latent diffusion models in scaling these powerful techniques to high-resolution imagery.


Objective of the Tutorial : Through a blend of theoretical exposition, intuitive explanations and a hands-on session, participants will gain a solid understanding of how diffusion models learn complex data distributions and effectively generate visual content. This tutorial will enable them to confidently explore and contribute to this exciting field.

Expected background of the audience : The tutorial is suitable for senior undergrad and masters students, researchers & practitioners who wants to understand the fundamentals of visual content generation. The only prerequisite if the prior familiarity with deep learning fundamentals (CNNs, transformers etc) and basic probability concepts.


Speaker Bio

Dr. Lokender Tiwari, is a Senior Research Scientist / Principal Investigator @ TCS Research, IIT-Delhi R&I Park.


[ Personal Webpage ]

[ email : lokender.work@gmail.com ]

Sample Image
Schedule
  • Section 1: The Foundations of Score-Based Generative Models (50 Mins) [PDF]
  • Section 2: Denoising Diffusion Probabilistic Models (DDPMs) as a Score-Based Model (40 Mins) [PDF]
  • Section 3: Sampling Strategies and Conditional Generation (40 Mins) [PDF]
  • Section 4: Advanced Architectures and Latent Score-Based Models (20 mins) [PDF]
  • Section 5: Hands-on session on Image Generation (30 mins) (participants must create a Google Colab account for hands-on session [PDF]
Sample References
  1. Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in neural information processing systems 33 (2020): 6840-6851. [PDF]
  2. Song, Yang, et al. "Sliced score matching: A scalable approach to density and score estimation." Uncertainty in artificial intelligence. PMLR, 2020. [PDF]
  3. Song, Yang, et al. "Score-based generative modeling through stochastic differential equations." arXiv preprint arXiv:2011.13456 (2020). [PDF]
  4. Song, Yang, and Stefano Ermon. "Generative modeling by estimating gradients of the data distribution." Advances in neural information processing systems 32 (2019). [PDF]
  5. Song, Jiaming, Chenlin Meng, and Stefano Ermon. "Denoising diffusion implicit models." arXiv preprint arXiv:2010.02502 (2020). [PDF]
  6. Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. [PDF]
  7. Lin, Chen-Hsuan, et al. "Magic3d: High-resolution text-to-3d content creation." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023. [PDF]
  8. Jain, Ajay, et al. "Zero-shot text-guided object generation with dream fields." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. [PDF]
  9. Poole, Ben, et al. "Dreamfusion: Text-to-3d using 2d diffusion." arXiv preprint arXiv:2209.14988 (2022). [PDF]
  10. Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021). [PDF]
  11. Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 1.2 (2022): 3. [PDF]
  12. Meng, Chenlin, et al. "Sdedit: Guided image synthesis and editing with stochastic differential equations." arXiv preprint arXiv:2108.01073 (2021). [PDF]