DCGAN implementation from scratch
Implementing DCGAN from Scratch
In this video, we will implement DCGAN from scratch and train it to generate images using deep convolutional neural networks. We will follow the guidelines outlined in a paper that used DCNNs instead of fully connected ones to achieve better quality images.
Generator and Discriminator Architecture
- The generator takes in a 100-dimensional vector of noise and uses transpose convolutions to upscale the image until it obtains 64x64 RGB channels.
- The discriminator uses convolution layers instead of pooling layers, batch normalization, and leaky ReLU activation function for all layers except for the output which uses tanh.
Guidelines for Stable GAN Training
- Remove pooling layers, fully connected hidden layers, use only convolutions.
- Use batch normalization in both generator and discriminator.
- Use ReLU activation function for all layers except for the output which uses tanh (generator) or leaky ReLU (discriminator).
- Use mini-batch size of 128. Initialize weights with normal distribution with mean 0 and standard deviation 0.2 in leaky ReLU where slope is 0.2. Use Adam optimizer with learning rate of 2e-4 and beta parameters set to 0.5 instead of the standard value of 0.9 to stabilize training.
Implementation Details
- Create a model.py file to implement the discriminator and generator classes.
- Inherit from nn.Module class.
- Define a block that takes input channels as an argument.
- For the discriminator class, take images and features d as inputs.
- For the generator class, take a 100-dimensional vector of noise as input.
Discriminator
In this section, the speaker explains how to create a discriminator using PyTorch.
Creating the Discriminator
- The first step is to define the input and output channels.
- Then, set up the kernel size, stride, and padding for the convolutional layer.
- Use batch normalization and LeakyReLU activation function.
- Define forward of self and x.
Generator
In this section, the speaker explains how to create a generator using PyTorch.
Creating the Generator
- Define init of self with set dimension noise dimension and channels of image.
- Call super method.
- Create block self that takes in some in channels and sends out some out channels with kernel size stride and padding.
- Return sequential of com transpose 2d with some in channels, some out channels, kernel size stride padding. Set bias equals false because we're going to use batch norm.
- Use batch norm 2d then ReLU activation function.
Input Shapes
In this section, the speaker goes over input shapes for both generator and discriminator.
Input Shapes
- The input shape for both generator and discriminator is n times channels image times 64 times 64.
- After each block in the discriminator, output shape changes from 32x32 to 16x16 to 8x8 to 4x4 pixels.
- At end of last block when it's four by four pixels they do another comp layer so com2d of features d times eight outputted as single channel representing if it's fake or real image.
I'm sorry, but I cannot see any transcript provided. Please provide the transcript so that I can summarize it for you.
Setting up the Training
In this section, the speaker sets up the hyperparameters and imports necessary libraries for training.
Importing Libraries
- The speaker imports all necessary libraries for training, including discriminator, generator, and initialize weights function.
Setting Hyperparameters
- The device is set to torch.device. If cuda is available, it will be used; otherwise, CPU will be used.
- Learning rate is set to 2e-4. Batch size is set to 128. Image size is set to 64. Channels of the image are set to 1 for MNIST dataset. Set dimension is 100. Number of epochs is set to five. Features are set to 64 for both generator and discriminator.
Setting Transforms
- Transforms are composed using transforms.resize, transforms.to_tensor, and transforms.normalize with a list of [0.5, 0.5] for each channel in the image dataset being used (MNIST or celebrity). This ensures that normalization remains general regardless of the number of channels in use later on in the code when switching datasets from MNIST to celebrity data sets or vice versa.
Loading Data Sets
- The data loader is created using DataLoader with batch size as specified earlier and shuffle equals true.
Initializing Generator and Discriminator
- Generator and discriminator are initialized with noise dimension equaling set dimension (100), channels image equaling one (for MNIST), features gen equaling features discriminator (both equaling 64), and sent to CUDA if available.
Optimizer and Loss Function Setup
- Optimizers are defined using optim.adam with learning rate as specified earlier and beta values of 0.5 and 0.999 respectively for generator and discriminator.
- Loss function is defined as nn.bce_loss.
Setting up the Training
In this section, we set up the training for our GAN model.
Training the Discriminator
- We train the discriminator to maximize log of d of x and then plus log one minus d of g of z.
- We reshape the discriminator output to a single value for each example.
- We calculate loss on that discriminator of the real and fake data.
- We get lost disk as lost this real plus loss disk fake and divide it by two.
- Finally, we backward propagate loss disk and opt disk dot step.
Training the Generator
- We train the generator to minimize log of 1 minus d of g of z or maximizing log of d of g of z.
- We generate some fake images on fixed noise using torch vision utils to make a grid of images and write that to tensorboard.
- Update step variable used to see progression in images generated by generator model.
Introduction to DCGAN
In this section, the speaker introduces the concept of Deep Convolutional Generative Adversarial Networks (DCGAN) and explains how it differs from a fully connected network.
DCGAN vs Fully Connected Network
- DCGAN is a type of neural network that uses convolutional layers instead of fully connected layers.
- The convolutional layers help to capture spatial information in images, making it more effective for image generation tasks.
- The speaker suggests trying out a fully connected network for comparison.
Using Celeb A Dataset
In this section, the speaker explains how to use the Celeb A dataset with DCGAN.
Setting up the Dataset
- The Celeb A dataset contains images of celebrities and can be downloaded from Kaggle.
- To use the dataset with DCGAN, create a folder called "CelebA" and place another folder inside it containing all the images.
- Use
data_sets.image_folderto load the dataset automatically.
Training and Results
In this section, the speaker discusses training DCGAN on Celeb A dataset and presents its results.
Training on Celeb A Dataset
- Train DCGAN on Celeb A dataset by running it for multiple epochs.
- The speaker stopped training after three epochs due to time constraints but suggests training for more epochs for better results.
Results
- The generated images are far from perfect but show improvement over previous attempts using different networks.
- DCGAN is still sensitive to hyperparameters, so experimenting with them is encouraged.
Conclusion and Future Work
In this section, the speaker concludes by summarizing what was covered in this video and discussing future work.
Summary
- DCGAN is a type of neural network that uses convolutional layers for image generation tasks.
- Celeb A dataset can be used with DCGAN by creating a folder structure and using
data_sets.image_folder.
- Training DCGAN on Celeb A dataset shows improvement over previous attempts but is still sensitive to hyperparameters.
Future Work
- The speaker plans to focus on improving the stability of GANs in upcoming videos.
- More advanced architectures will be implemented in future videos for better performance.