Style LoRA Training guide for Stable diffusion 1.5 and SDXL Concepts Results and Conclusion
Introduction to Training a Style for Stable Diffusion 1.5 and SDXL with Laura
This video focuses on training a style for stable diffusion 1.5 and SDXL with Laura. It explains the concepts of style training, the differences from object or character training, data and training parameters, determining if a style is trained well, and what to expect from it.
Training Parameters and Data Preparation
- To create your own style, start by defining a folder for each type of object you want to train.
- Collect images from Google or various websites related to the specific object type.
- Choose larger image sizes if possible for better results.
- Organize the images in folders based on their category (e.g., animals, buildings, clothing).
- Prepare the image folder by putting the images in subfolders named "Style" and "Regularization."
- Use regularization images that match the desired style for one repeat.
- Caption the images using captioning utilities like wd14 captioning with a prefix indicating the style name.
Removing Undesired Tags
- After generating tags for captions, use a tool like Borrow Dataset Tag Manager to remove undesired tags related to your art style.
- Review each caption if possible but removing common undesired tags like grayscale, monochrome, line art, etc., is recommended.
- Save all changes periodically due to potential bugs in the software.
Conclusion
Training a style for stable diffusion 1.5 and SDXL requires careful data preparation and parameter selection. By organizing images into specific folders and removing undesired tags from captions, you can create your own unique art style.
New Section
This section focuses on selecting folders and giving suitable names for the output folder.
Selecting Folders and Naming Output Folder
- Choose the "images" folder instead of other folders.
- Do not select the "a class" folder unless regularization is needed.
- Give a suitable name to the output folder, which doesn't have to be the same as the instance property.
New Section
In this section, we learn about naming parameters and choosing a suitable network rank and alpha values.
Naming Parameters and Choosing Network Rank and Alpha Values
- The name given to parameters does not need to match the instance property.
- Consider including version or other relevant information in parameter names.
- Choose a suitable network rank, such as 32.
- Lowering alpha value can result in smoother training.
- Consider using values like 304 for convolutional ranker.
New Section
This section discusses determining the number of epochs based on steps and selecting a standard Laura model for training.
Determining Number of Epochs and Selecting Model
- The number of epochs depends on the target number of steps.
- For example, if there are 2 repeats with 400 images (800 steps per epoch), targeting 8,000 steps would require 10 epochs.
- The choice of epochs also depends on testing requirements.
- Use a standard Laura model for training, which is commonly used by people.
New Section
Here we explore generating captions for images using BF format and caching Latins to disk for multiple models.
Generating Captions and Caching Latins
- Use BF format for captioning mixed bridge institutions with RTX Nvidia cards.
- Cache Latins to disk to speed up training when creating multiple models.
- Creating a small file for each image helps in testing new settings.
New Section
This section covers the learning rate, schedule, and advanced settings for training.
Learning Rate, Schedule, and Advanced Settings
- Keep the learning rate as it is by default.
- Use the same schedule as the default setting.
- Consider using other adapters like Adam for lower VRAM.
- Use 768 for batch size if not having different resolutions.
- Enable pockets if working with different resolutions.
New Section
Here we discuss advanced settings related to samples and noise offset.
Samples and Noise Offset
- Generate one sample every epoch to check if the style is working.
- The number of samples depends on testing requirements.
- Set noise offset according to recommended values.
New Section
This section emphasizes that training lower dimension networks is faster than higher dimension networks.
Training Lower Dimension Networks
- Training lower dimension networks is faster than higher dimension networks.
- Example: 1.8 iterations per second for C Bear dataset using an 8GB RTX 3070 GPU.
- Check results during training to assess progress and acceptability of style.
New Section
In this section, we observe the results of training with different ebooks and styles.
Observing Results with Different Ebooks and Styles
- Test different styles by running a Koya process with various ebooks (e.g., young woman, landscape).
- Note that training lower dimension networks is faster than higher dimension networks.
- Assess the quality of drawing style at different ebook stages.
New Section
This section discusses the results obtained from training with regularization.
Results of Training with Regularization
- Trained for 5 ebooks and 8,000 steps using a regularization factor of 2.
- The first ebook's results are acceptable but not good.
- The second ebook shows very good drawing style.
- Changes in style observed between the third, fourth, and fifth ebooks.
- Images generated do not exist in the dataset, indicating learning of style rather than specific images.
New Section
This section concludes the discussion on training with regularization and provides final observations.
Final Observations on Training with Regularization
- Trained for 5 ebooks and 8,000 steps using a regularization factor of 2.
- First ebook's results are acceptable but not good.
- Second ebook shows very good drawing style.
- Changes in style observed between the third, fourth, and fifth ebooks.
- Images generated do not exist in the dataset, indicating learning of style rather than specific images.
Model Selection and Training Options
In this section, the speaker discusses the model selection and training options for the SDXL-based model. They mention that it is possible to train with or without classification images and recommend testing both options with regularization to determine which produces better results.
Model Selection
- The speaker selects the SDXL-based model.
- They choose the "images" folder as input data.
- Classification images are optional for training.
Training Options
- Regularization is recommended for training.
- Two repeats per image are used.
- A set of one is chosen for mixed precision or FB or BS (suitable for NVIDIA training).
- Default settings are kept for other parameters.
Image Resolution and Network Rank
This section focuses on image resolution and network rank settings for training the SDXL-based model.
Image Resolution
- The recommended image resolutions are 1024 by 1024.
- Pockets can be enabled if there are different resolutions in the dataset, but in this case, all files have a resolution of 1024.
Network Rank
- Lower network ranks should be used compared to the original SD 1.5 model.
- For example, if SD 1.5 has a network rank of 128, a value around 16 or 8 should be used in this case.
- Experimental values like network Alpha 4 or 1 can be tried.
Advanced Settings and GPU Considerations
This section covers advanced settings and considerations related to GPU capabilities during training.
Advanced Settings
- Gradient checkpointing can be used to reduce VRAM usage when GPU memory is limited (e.g., using a GPU with only 12GB VRAM).
- Xformers can be used with additional parameters for training units only, but it may not produce significant differences.
GPU Considerations
- The speaker mentions having a GPU with 24GB VRAM.
- If the GPU has 12GB VRAM, gradient checkpointing is recommended to reduce VRAM usage.
Noise Value and Instance Prompt
This section discusses setting a noise value and using an instance prompt during training.
Noise Value
- It is recommended to set a noise value based on recommended values such as 0.05 or 0.375 for SDXL training.
Instance Prompt
- An existing instance prompt in SDXL can be used for faster training and potentially better results.
- For example, if training a character similar to another character in SDXL, the name of the character can be put as the instance prompt.
Art Styles and Training Options
This section focuses on art styles and different training options for SDXL.
Art Styles
- SDXL already covers most art styles, so selecting a specific art style may not produce significant improvements.
Training Options
- The speaker starts testing with the "unit only" option.
- Another test will be conducted without using units (full version 2).
- The same parameters as before will be used for both tests.
Testing and Comparing Results
In this section, the speaker explains how to conduct testing and compare results using different settings.
Testing Process
- Select a checkpoint (e.g., Photon for Stable Fusion 1.5) for testing.
- Use XYZ prompts to replace values in the script.
Comparing Regularization vs. Non-Regularization
- Test with regularization and without regularization.
- Use a random seed to generate multiple images.
Comparing Different Weights
- Test using different weights (e.g., Epoch 2 and 4) for style.
- Generate pictures for each epoch and compare the results.
Results Analysis
This section focuses on analyzing the results obtained from testing different settings.
Style Strength and Regularization
- Full strength weight of 1 produces the best results.
- Regularization generally produces better results than without regularization.
The summary has been created based on the provided transcript.
Regularization and Results Comparison
The speaker discusses the impact of regularization on the results obtained in image sketching. They compare the outcomes with and without regularization, highlighting that regularization tends to produce better results.
Regularization Effects on Sketching Results
- With regularization, the produced sketches appear more refined compared to those without regularization.
- The speaker notes that at a weight of one, the results are good and acceptable.
- When mixed with another lure at half the weight, it turns the image into black and white while slightly altering its appearance.
- At a weight of two, although not always perfect, it still produces recognizable objects.
Further Testing and Flexibility of Regularization
The speaker emphasizes conducting additional tests using different checkpoints and weights to explore the effects of regularization further. They highlight that regularization generally provides more flexible results in terms of sketching and style.
Testing Different Checkpoints and Weights
- Tests should be conducted on various checkpoints with different objects to observe how regularization performs.
- At a weight of one or 1.5, good results resembling hand sketches are achieved.
- Even at a weight of two, acceptable outcomes can be obtained for certain objects.
- Regularization proves to be more flexible in producing desired sketching effects compared to other methods.
Style Learning in Stable Diffusion SDXL
The speaker discusses applying style learning techniques in stable diffusion SDXL. They compare different epochs and variations in training options to evaluate their impact on image style.
Comparing Different Training Options
- Comparisons are made between different epochs and training options (e.g., unit only).
- Results show slight differences between epochs but no significant improvement is observed with the unit only option.
- The tree image style changes slightly when trained with units only, resulting in a smaller file size.
- Regularization is suggested to enhance results even further.
Results and Conclusion
The speaker summarizes the findings of the study and draws conclusions based on the experimental results. They highlight the applicability of the principles discussed and emphasize the importance of regularization for achieving desired outcomes.
Findings and Final Conclusion
- The principles discussed in this video can be applied to various styles.
- Regularization consistently improves results compared to not using it.
- Stable diffusion SDXL demonstrates potential for learning style, although it is an extension of existing techniques.
- Further experimentation and testing are recommended to explore different strengths and variations in regularization.