Stable Diffusion

Henley Zhang
November 18, 2024

Part 5A The Power of Diffusion Models

emir

In this part, I played around with a pretrained diffusion model which is called DeepFloyed IF. I used this model to denoise an image. This method was used to implment diffusion. Then classifer free guidende was implemented.

Part 0. Sampling using a diffusion model

Approach

For this part, I just used pytorch along with the deepfloyed diffusion model to output images.This was done using 20 and 100 steps. I used seed 1234 The quality does improve with more steps, and it does show the text prompts.

Results steps = 20

snowet

An oil painting of a snowy mountain

hat

man wearing a hat

rs

rocket ship

Results steps = 30

snowet

An oil painting of a snowy mountain

hat

man wearing a hat

rs

rocket ship

1.1 Forward Process

Approach

Here I wrote the forward process which was the equation which adds noise to an image with mean sqrt(alpha) and variance sqrt(1-alpha)

forward

Results

forward

Original Image

forward

Image with noise added t=250

forward

Image with noise added t=500

forward

Image with noise added t=750

1.2 Classical Denoising

Approach

Here I denoised by adding a gaussian filter blur to the noised images with kernel 13 sigma 2

Results

forward

Gaussian Blur Denoised t=250

forward

Gaussian Blur Denoised t=500

forward

Gaussian Blur Denoised t=750

1.3 One Step Denoising

Approach

Use the pretrained difussion model and equations given to recover the image from the noise. THe model predicts the noise, and the equation is used to reocver the predicted image.

Results

forward

Original Image

forward

Image with noise added t=250

forward

One step denoised t=250

forward

Image with noise added t=500

forward

One step denoised t=500

forward

Image with noise added t=750

forward

One step denoised t=750

forward

One Step Denoised Result

Gaussian Blur Denoised t=750

1.4 Iterative Denoising

Approach

We can get a much better result if we denoise in steps to get the clear image. We first create a list of timesteps strided timestpes. For each timestep we use this equation to recover the next iteration of denoised image.

forward forward

Results

forward

Iterative Denoised t=90

forward

Iterative Denoised t=240

forward

Iterative Denoised t=390

forward

Iterative Denoised t=540

forward

Iterative Denoised t=690

forward

Iterative Denoised Result

forward

One Step Denoised Result

forward

Gaussian Blur Denoised Result

1.5 Diffusion Model Sampling

Approach

Instead of starting from a noise image, we can start from pure noise to sample from the diffusion model.

Results

church

Sample 1

church

Sample 2

church

Sample 3

church

Sample 4

church

Sample 5

1.6 Classifier Free Guidance

Approach

We can get a much higher quality image if we apply CFG. We the noise estimate of the conditional and the unconditional.

forward

Results

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

1.7 Image to Image Translation

Approach

Here we take a image and add noise to it, then denoise. We take the original image noise and force it into the image manifold without conditioning.

Results

forward

Original Image Example 1

forward

Ex1 SDEdit i_start = 1

forward

Ex1 SDEdit i_start = 3

forward

Ex1 SDEdit i_start = 5

forward

Ex1 SDEdit i_start = 7

forward

Ex1 SDEdit i_start = 10

forward

Ex1 SDEdit i_start = 20

forward

Original Image Example 2

forward

Ex2 SDEdit i_start = 1

forward

Ex2 SDEdit i_start = 3

forward

Ex2 SDEdit i_start = 5

forward

Ex2 SDEdit i_start = 7

forward

Ex2 SDEdit i_start = 10

forward

Ex2 SDEdit i_start = 20

forward

Original Image Example 3

forward

Ex3 SDEdit i_start = 1

forward

Ex3 SDEdit i_start = 3

forward

Ex3 SDEdit i_start = 5

forward

Ex3 SDEdit i_start = 7

forward

Ex3 SDEdit i_start = 10

forward

Ex3 SDEdit i_start = 20

Hand Drawn and Web Images

We apply the same process on hand drawn and web images

Results

forward

Avocado Ex 1

forward

Avocado SDEdit i_start = 1

forward

Avocado SDEdit i_start = 3

forward

Avocado SDEdit i_start = 5

forward

Avocado SDEdit i_start = 7

forward

Avocado SDEdit i_start = 10

forward

Avocado SDEdit i_start = 20

forward

Earth Ex 1

forward

Earth SDEdit i_start = 1

forward

Earth SDEdit i_start = 3

forward

Earth SDEdit i_start = 5

forward

Earth SDEdit i_start = 7

forward

Earth SDEdit i_start = 10

forward

Earth SDEdit i_start = 20

forward

Tomato Ex 1

forward

Tomato SDEdit i_start = 1

forward

Tomato SDEdit i_start = 3

forward

Tomato SDEdit i_start = 5

forward

Tomato SDEdit i_start = 7

forward

Tomato SDEdit i_start = 10

forward

Tomato SDEdit i_start = 20

1.7.2 InPainting

Approach

We apply the process so that we can the same content whenever the mask is 0 and diffused content when the mask is 1, where we apply this equation

forward

Results

forward

Original

forward

mask

forward

Hole to fill

forward

Result

forward

Original

forward

mask

forward

Hole to fill

forward

Result

forward

Original

forward

mask

forward

Hole to fill

forward

Result

1.7.3 Text-Conditional Image-to-Image Translation

Approach

Instead of using a high quality photo, we can use a different text prompt to translate.

Results

forward

Rocket Noise Level 1

forward

Rocket Noise Level 3

forward

Rocket Noise Level 5

forward

Rocket Noise Level 7

forward

Rocket Noise Level 10

forward

Rocket Noise Level 20

forward

Original

forward

Oski Original

forward

Skull Noise Level 1

forward

Skull Noise Level 3

forward

Skull Noise Level 5

forward

Skull Noise Level 7

forward

Skull Noise Level 10

forward

Skull Noise Level 20

forward

Puff Original

forward

Skull Noise Level 1

forward

Skull Noise Level 3

forward

Skull Noise Level 5

forward

Skull Noise Level 7

forward

Skull Noise Level 10

forward

Skull Noise Level 20

1.8 Visual Anagrams

Approach

We can make visual anagrams where the image would look like one thing from one side and another thing from another side. At step t we use the first prompt to get noise e1, then we flip the iamge and get noise e2. We then average the noise and use that as our noise estimate.

forward

Results

church

An Oil Painting of People Around a Fire

church

An Old Man

church

Snowy Village

church

Barista

church

Rocket Ship

church

Pencil

1.9 Hybrid Images

Approach

We can do somthing similar to project 2 where we create hybrid iamge.s We take the low pass noise of one prompt and the high pass noise of the other prompt and combine the noise to diffuse.

forward

Results

Dog Skull

Rocket Waterfall

Waterfall Skull

Part 5B Diffusion Models From Scratch

emir

1.2 Using the UNet to Train a Denoiser

Approach

We first implement a UNet mode to train a denoiser. Lets first visualize the effect of adding noise to the image.

church

To implemnt the unet, I followed this architecture

church I trained the denoiser by regressing to the orignal image after adding noise to it. church

Results

church church

We can also check the denoisers effectiveness on other values of sigma. It was trained in sigma = 0.5

church church church church church church

Training a DDPM Denoising UNET

Approach

We define the FCBlock add add it allowing it to accept a time parameter.

forward forward

We can then sample in a similar process to 5a to implment the DDPM.

Results

forward

5 Epochs Result

forward

20 Epochs Result

Class Conditioned

Approach

Class conditioning was impplemented so that specfic digits can be generated. It was implemented by adding additional fcblocks that add class infomration into the Unet.

Results

forward forward

5 Epochs Result

forward

20 Epochs Result