In this part, I played around with a pretrained diffusion model which is called DeepFloyed IF. I used this model to denoise an image. This method was used to implment diffusion. Then classifer free guidende was implemented.
For this part, I just used pytorch along with the deepfloyed diffusion model to output images.This was done using 20 and 100 steps. I used seed 1234 The quality does improve with more steps, and it does show the text prompts.
Here I wrote the forward process which was the equation which adds noise to an image with mean sqrt(alpha) and variance sqrt(1-alpha)
Here I denoised by adding a gaussian filter blur to the noised images with kernel 13 sigma 2
Use the pretrained difussion model and equations given to recover the image from the noise. THe model predicts the noise, and the equation is used to reocver the predicted image.
We can get a much better result if we denoise in steps to get the clear image. We first create a list of timesteps strided timestpes. For each timestep we use this equation to recover the next iteration of denoised image.
Instead of starting from a noise image, we can start from pure noise to sample from the diffusion model.
We can get a much higher quality image if we apply CFG. We the noise estimate of the conditional and the unconditional.
Here we take a image and add noise to it, then denoise. We take the original image noise and force it into the image manifold without conditioning.
We apply the same process on hand drawn and web images
We apply the process so that we can the same content whenever the mask is 0 and diffused content when the mask is 1, where we apply this equation
Instead of using a high quality photo, we can use a different text prompt to translate.
We can make visual anagrams where the image would look like one thing from one side and another thing from another side. At step t we use the first prompt to get noise e1, then we flip the iamge and get noise e2. We then average the noise and use that as our noise estimate.
We can do somthing similar to project 2 where we create hybrid iamge.s We take the low pass noise of one prompt and the high pass noise of the other prompt and combine the noise to diffuse.
We first implement a UNet mode to train a denoiser. Lets first visualize the effect of adding noise to the image.
To implemnt the unet, I followed this architecture
I trained the denoiser by regressing to the orignal image after adding noise to it.We can also check the denoisers effectiveness on other values of sigma. It was trained in sigma = 0.5
We define the FCBlock add add it allowing it to accept a time parameter.
We can then sample in a similar process to 5a to implment the DDPM.
Class conditioning was impplemented so that specfic digits can be generated. It was implemented by adding additional fcblocks that add class infomration into the Unet.