Finally, AI for music production! Free & open source

Finally, AI for music production! Free & open source

Foundation One: A Free AI for Music Production

Overview of Foundation One

  • Foundation One is a free and open-source AI tool designed for music production, capable of creating separate stems that can be mixed and mastered manually.
  • The model allows users to specify tempo, key, and bar count, making it versatile for various musical styles.
  • It understands musical concepts such as BPM, major/minor keys, and various effects like reverb and distortion.

Demonstrations of Capabilities

  • An example prompt generates a bass track with specific effects (medium delay, medium reverb), confirming its ability to adhere to specified settings (8 bars at 140 BPM in E minor).
  • Another demo showcases the generation of a gritty bass track with defined synth shapes (small square), demonstrating the model's flexibility in sound design.
  • Additional examples include flute and trumpet sounds that highlight the model's capability to produce diverse instrument tones.

Personal Demo Experience

  • The presenter shares their personal experience using Foundation One to generate samples for layering in a Digital Audio Workstation (DAW).
  • The interface allows users to input detailed prompts; the presenter demonstrates generating an audio track based on specific keywords within 15 seconds.

Layering Tracks in DAW

  • After generating an initial synth track, the presenter layers additional tracks while maintaining consistent BPM and key for cohesion.
  • A second generated track features upbeat piano sounds that blend well with the first track.
  • The presenter highlights the option to download MIDI files from generated audio clips for further manipulation with different instruments.

Creating Music with AI Tools

Importing and Modifying MIDI Files

  • The speaker demonstrates how to import a MIDI file into their digital audio workstation (DAW) by dragging it onto the interface.
  • After soloing the track, they play the sound using a chimes instrument, noting that while it's not perfect, it closely resembles the original melody.
  • To improve clarity, they decide to move all tracks up one octave due to excessive mid frequencies.

Adding New Instrument Tracks

  • The speaker expresses interest in adding staccato strings to enhance the composition and formulates a prompt for this purpose.
  • They download the generated track and integrate it into their DAW, resulting in five total tracks. They anticipate that the initial mix will sound noisy but plan to refine it later.

Mixing and Finalizing Composition

  • The speaker plans to manually add bass and drums since percussion sounds are outside the AI's capabilities. They also mention rearranging elements for better mixing.
  • After three minutes of work, they present a simple song created with AI-generated samples that align in BPM and key.

Overview of AI Capabilities

  • The speaker highlights the flexibility of using AI tools for music creation, including downloading MIDI files for use with different virtual instruments or generating new tracks based on existing ones.

Introduction to Chat LLM by Abacus AI

  • The video introduces Chat LLM as an all-in-one platform for accessing various AI models seamlessly within one interface.
  • It features quick integration of new models and includes useful tools like artifacts for side-by-side generation previews.

Features of Chat LLM

  • Users can access multiple image generators and video models at a low monthly cost compared to individual subscriptions.
  • The platform supports numerous instruments trained on Foundation One, excelling particularly in synth sounds among other categories like strings and brass.

Instrument Breakdown & Sample Control

  • A detailed breakdown is provided regarding supported instruments such as synth leads, pianos, strings, wind instruments, etc., emphasizing limitations on unsupported types.

Controlling Sound Characteristics

  • Users can manipulate sample characteristics through keywords related to timbre (e.g., warm or bright), effects (e.g., reverb or delay), and musical structure (e.g., melody or arpeggio).

Installation Requirements

  • Installation instructions highlight that the model runs efficiently on consumer GPUs with a minimum VRAM requirement of 8 GB. Future quantized versions may accommodate lower VRAM setups.

Installation and Setup of RC Stable Audio Tools

System Requirements and Initial Setup

  • The official version of the software requires 8 GB of VRAM, with an RTX 3090 capable of processing samples in approximately 7 to 8 seconds.
  • Users are recommended to download the "RC stable audio" fork for optimal performance, with a link provided in the description.

Installing Git

  • Before proceeding, ensure that Git is installed on your computer. Instructions for installation are available if needed.
  • For Windows users, downloading the appropriate .exe file is necessary; follow default installation settings for ease.

Cloning the Repository

  • After installing Git, choose a directory (e.g., Desktop) to clone the repository using command prompt commands.
  • Once cloned, a new folder containing all files from the GitHub repository will appear on your desktop.

Creating a Virtual Environment

  • Navigate into the newly created folder and prepare to set up a virtual environment. Using Anaconda or Minionda is suggested for managing Python environments effectively.
  • Minionda is recommended over Anaconda due to its smaller size and faster installation process while still providing essential packages.

Finalizing Installation Steps

  • After downloading Minionda, run the installer and configure it by agreeing to terms and setting it as accessible for all users.
  • Ensure that Anaconda's path is added to system environment variables so that it can be recognized in command prompt after installation completion.

Setting Up Python Version

  • It’s advised to use Python version 3.10 as newer versions may cause dependency resolution issues during setup.
  • Create a new virtual environment named "stable audio" using Python 3.10 within your terminal session. This step helps manage dependencies more efficiently when working with AI tools.

Setting Up a Virtual Environment for Stable Audio

Creating a Separate Environment

  • The virtual environment acts like a separate hard drive, isolating packages and dependencies to prevent conflicts with existing AI tools or Python versions on your computer.
  • Activating the new virtual environment named "stable audio" is done using the command cond activate stable audio, which indicates you are now operating within this isolated space.

Installing Dependencies

  • It’s recommended to install the CUDA version of Torch first if you have a CUDA GPU, as it optimizes performance for compatible hardware.
  • The installation of Torch, Torch Vision, and Torch Audio for CUDA is substantial (over 2 GB), requiring time based on internet speed. Successful installation is confirmed by seeing no error messages.
  • After installing Torch, proceed to install Stable Audio Tools; again, successful installation will be indicated by the absence of error messages.

Starting from Scratch

  • To restart the setup process later, navigate to the Stable Audio Tools folder and open Command Prompt by typing cmd.
  • Activate your virtual environment again with cond activate stable-audio before running the command to start up the interface using Python: python rungradio.py.

Downloading Models

  • Upon starting the interface, users are prompted to download a model (Foundation 1 model), which is approximately 2.4 GB in size and may take several minutes.
  • After downloading completes successfully, users can re-enter their commands to access the interface again.

Generating Audio Clips

  • Once set up correctly with models downloaded, users can generate audio clips by entering prompts that describe desired audio characteristics.
  • Users can specify various parameters such as number of bars, beats per minute (BPM), key selection, and seed value for randomness in generation.

Adjusting Settings for Quality

  • The seed value influences variations in generated outputs; setting it at -1 allows randomization for diverse results each time.
  • Additional settings include adjusting steps for quality—more steps yield higher quality but longer generation times; around 75 steps is suggested as an optimal balance.
  • CFG values determine how closely AI adheres to prompts; higher values mean stricter adherence while lower values introduce creativity and randomness.

Foundation One: A Quick Overview of Features

Generating Audio Clips

  • The speaker demonstrates generating an eight-bar audio clip using an RTX 5000 ADA with 16 GB of VRAM, noting it takes less than 20 seconds.
  • The generated audio includes a piano roll display, allowing users to see the exact keys played and download the MIDI file for use in a DAW (Digital Audio Workstation).
  • Users can select different instruments to play back the MIDI notes, showcasing flexibility in sound customization.

Style Transfer Feature

  • The style transfer feature allows users to upload a reference clip to influence the output's style. Adjusting settings determines how much influence the reference has on the new clip.
  • By setting a lower influence value, the output closely resembles the reference clip; higher values reduce this effect. Additional effects like reverb and ping-pong delay can be added before generating.

Instrument Replacement

  • Users can replace original instruments by specifying new ones in prompts. This feature enhances creativity by allowing complete transformation of sounds.
  • The interface is user-friendly for downloading and utilizing various clips within a DAW, making it accessible for music composition.

Comparison with Comfy UI

  • While there is a Comfy UI node available for Foundation One, it currently lacks style transfer capabilities compared to the gradio interface preferred by the speaker.

Limitations and Future Prospects

  • The speaker notes that while Foundation One excels at creating synth sounds, other instrument outputs may not sound as realistic. It performs well with arpeggios but struggles with melodies or ambient samples.
  • Encouragement is given to explore other music generators featured on their channel, highlighting ongoing developments in AI tools for music creation.
Video description

Foundation-1 installation tutorial. AI for music samples, stems, loops. Follows key, bar count, and BPM #ai #aimusic #aisong #suno Thanks to our sponsor Abacus AI. Try ChatLLM & DeepAgent here: http://chatllm.abacus.ai/?token=aisearch Foundation 1: https://huggingface.co/RoyalCities/Foundation-1 RC Stable Audio Tools https://github.com/RoyalCities/RC-stable-audio-tools Foundation 1 for ComfyUI https://github.com/Saganaki22/ComfyUI-Foundation-1 Git https://git-scm.com/install/windows Conda https://www.anaconda.com/docs/getting-started/miniconda/install/overview#anaconda-website Other open source AI music models: Heartmula https://youtu.be/54YB-hjZDR4 ACE Step 1.5 https://youtu.be/QzddQoCKKss 0:00 Foundation 1 intro 0:32 Foundation-1 demos 4:58 Full music production example 8:20 Generating MIDI files 9:30 Production continued 12:12 ChatLLM 13:19 Additional specs and guidelines 15:33 How to install Foundation-1 16:32 Git 17:30 Installation continued 18:38 Conda 21:24 Installation continued 24:08 How to use Foundation-1 28:12 Sample to MIDI 28:58 Style transfer 31:19 ComfyUI workflow Learn AI: https://ai-search.io/courses Newsletter: https://aisearch.substack.com/ Find AI tools & jobs: https://ai-search.io/ Support: https://ko-fi.com/aisearch Here's my equipment, in case you're wondering: Lenovo Thinkbook: https://amzn.to/4jWeKwH Dell Precision 5690: https://www.dell.com/en-us/dt/ai-technologies/index.htm?utm_source=AISearchTools&utm_medium=youtube&utm_campaign=precisionai#tab0=0 GPU: Nvidia RTX 5000 Ada https://nvda.ws/3zfqGqS Mic: Shure SM7B https://amzn.to/3DErjt1 Audio interface: Scarlett Solo https://amzn.to/3qELMeu