How to train your own AI image models without breaking the bank

Name: How to train your own AI image models without breaking the bank
Uploaded: 2023-06-16T19:00:06.000Z
Duration: 50 min 50 s

Introduction and Background

In this section, the speaker introduces themselves and provides a brief background before diving into the topic of training AI models without breaking the bank.

The speaker is Maxime De Greve, a designer at GitHub originally from Belgium but living in the UK for over a decade.

The presentation will cover how to train AI models affordably.

The speaker mentions that Belgium is known for its beers, chocolate, and waffles.

Introduction to AI Models

This section provides an introduction to artificial intelligence (AI) models and their different types.

Artificial intelligence refers to machines that can learn, reason, and perform tasks requiring human-like understanding or cognitive abilities.

Examples of AI models include chat GPT (a virtual assistant for answering questions), as well as models that can output images.

Different tools for generating images include OpenAI's DALL·E, Mid-Journey, and Stable Diffusion by Stability AI.

Chat GPT is based on a large language model, while other models like Stable Diffusion are based on latent diffusion models.

Text-to-Image Models

This section focuses on text-to-image models and their capabilities.

Text-to-image models like DALL·E can generate photorealistic images based on given prompts.

These models use latent diffusion techniques to generate diverse and high-quality samples.

Difference Between Models

This section explains the difference between chat GPT (language-based model) and latent diffusion-based models.

Chat GPT is based on a large language model that understands and generates human-like language with complexity and nuance.

Latent diffusion models, like Stable Diffusion and Mid-Journey, learn a sequence of diffusion steps to generate data from random noise vectors in a probabilistic manner.

The presentation simplifies the models as text-to-text or text-to-image models for easier understanding.

Cost of Training AI Models

This section discusses the high cost associated with training AI models.

Training AI models can be expensive due to the infrastructure required, not necessarily the size of the team or dataset.

OpenAI reportedly spent over $100 million on training their latest model, primarily due to needing access to thousands of GPUs.

Each GPU unit costs around $10,000, resulting in significant expenses for setting up or renting infrastructure.

However, the cost is expected to decrease as technology improves.

GPU Explanation

This section provides a brief explanation of GPUs (Graphical Processing Units).

GPUs are graphical processing units commonly found in gaming PCs.

They play a crucial role in accelerating computations required for training AI models.

Timestamps were used wherever available.

Training Complex Generative Models

The speaker advises against training complex generative models from scratch and suggests leaving it to the professionals. Instead, they recommend fine-tuning existing pre-trained models on smaller datasets for specific tasks or domains.

Fine-Tuning Existing Models

Fine-tuning involves using a pre-trained model and updating its parameters on a smaller dataset to adapt it to a specific task or domain.

Open-source models make fine-tuning easy and accessible without any restrictions.

There are already many great fine-tuned models available online for various purposes, such as generating images in specific styles like Studio Ghibli or character artwork.

When to Create Your Own Model

Training your own model is necessary when you want to achieve specific features, styles, or reduce mistakes in representation.

For example, if you want to generate AI images of your dog, training a model specifically for your dog's breed would be beneficial.

Similarly, if you want to generate images of a particular celebrity like Pedro Pascal, training your own model would be necessary as existing models may not accurately represent individual celebrities.

However, for generic objects like hamburgers, it may not be necessary to train your own model as generic objects are easy to create with existing models.

Creating Your Own Model

The speaker discusses the process of creating their own models and provides insights into collecting training data and the hardware requirements.

Collecting Training Data

More training data is generally better as long as it is consistent in representing the desired features or style.

For face-related tasks, around 10 images can be sufficient for training. For style-related tasks like avatars, 50 to 100 images may be needed.

Images should be cropped to 512 by 512 pixels and given unique names that do not conflict with real-world meanings.

The speaker shares examples of collected images for training a model specific to the GitHub mascot, Mona, and avatars.

Using Existing Models for Training Data

Existing models can be used to generate training data. For example, using Mid-Journey to generate images and then fine-tuning Stable Diffusion using those images.

This approach eliminates the need to use images from the internet as training data.

Hardware Requirements

Training AI models requires a computer with a powerful GPU.

Most Macs do not have GPUs capable of training AI models, so it is important to ensure that the computer being used has a suitable GPU.

The transcript provided does not contain any timestamps beyond this point.

Getting Started with Google Colab

This section explains how to get started with Google Colab, a platform powered by Jupyter Notebook.

Opening a Notebook in Google Colab

Access Google Colab by going to collab.research.google.com in Chrome.

Click on "File" and then "Open notebook".

In the GitHub tab, enter the URL of the desired IPython notebook file.

Setting Up the Model

Scroll through the file and make necessary changes.

Update the model version to 2.1 with 512 pixels.

Create and load a session named "GH conf".

Adjust the training steps based on the total number of training data images prepared.

Running Cells and Training

Click on the play button next to each cell to run them chronologically.

Ignore certain cells like captions concept images and upload train model.

Pay attention to the cell responsible for training, which will display a progress bar with a green word indicating token generation.

Generating Images Using Fine-tuned Model

This section explains how to generate images using the fine-tuned model in Google Colab.

Testing the Model

After training is complete, run the next cell to launch a website for testing.

Click on the generated URL after a few minutes.

Image Generation Parameters

Enter "GH conf" as the prompt in the first input field.

Use negative text prompts to exclude specific elements from generated images.

Sampling Steps and CFG Scale

Adjust sampling steps for more iterations when generating images.

The CFG scale determines how closely generated images follow text prompts.

Examples of Generated Images

This section showcases examples of images generated using the fine-tuned model.

Prompt Examples

Examples of prompts used, such as "Mona as a Jedi" or "Mona in a supermarket".

Negative prompts used to exclude certain elements from the images.

Improved Image Quality

Comparison between the non-trained and trained models.

Improved lighting, positioning, and overall image quality with the trained model.

Diversity in Generated Images

Importance of diversity in avatar models to represent everyone.

Examples of diverse images generated using different prompts.

Final Touches and Addressing Mistakes

This section discusses final touches and addressing mistakes in generated images.

Addressing Mistakes

Acknowledgment that generative models may contain mistakes.

Example of an avatar with noticeable mistakes.

The transcript is already in English.

How to Access the Feature

In this section, the speaker explains how to access a specific feature on a website.

Accessing the Feature

Click on the "Image to Image" option at the top of the website.

Replace the prompt with "wool jumper".

Click on "In Paint" and paint the area that needs to be replaced.

Regenerating Brushed Areas

The speaker demonstrates how to regenerate brushed areas for better results.

Regenerating Brushed Areas

Continuously regenerate brushed areas until satisfied with the results.

Notice significant improvement in the last three images, especially in rendering eyes.

Challenges and Manual Work Involved

The speaker discusses challenges faced during regeneration and emphasizes manual work involved for optimal results.

Challenges and Manual Work

Difficulty in regenerating eyes required trial and error.

Artifacts may appear around painted areas, requiring manual cleanup using tools like Photoshop or Pixelmator Pro.

AI is not a foolproof solution; manual intervention is often necessary for best outcomes.

Upsizing Images Using AI

The speaker introduces upsizing techniques using AI for larger image sizes.

Upsizing Images

Use AI-based upscaling technique available on the same website used earlier.

Click on "Extras", then select "Upscaler 1 SRGA and 4x" option.

Generate an upscaled image four times larger than the original (e.g., from 512 pixels to 2048 pixels).

Retain details even after resizing, achieving almost production-quality images.

Image-to-Image Generation

The speaker explains how to generate variants of an image using an image-to-image technique.

Image-to-Image Generation

Access the same website used previously.

Click on "Image to Image" option.

Enter the desired prompt, but switch a word (e.g., from British male to British female).

Upload the original image and generate variants based on the modified prompt.

Quickly create a diverse collection of people for an avatar library.

Observations and Recommendations

The speaker shares observations and recommendations regarding AI avatars.

Observations and Recommendations

While generating images, the speaker noticed similarities with an actor from "The White Lotus" series, raising questions about whether the AI model was trained on that actor.

The speaker suggests trying out their Figma plugin called "Tiny Faces" or visiting a website called "Tiny Faces" for more AI avatars.

The transcript is already in English.