Breakthrough: Run Massive Models On Any Device (ex: LLaMA 65b)

Name: Breakthrough: Run Massive Models On Any Device (ex: LLaMA 65b)
Uploaded: 2023-07-18T14:39:08.000Z
Duration: 10 min 54 s

Introduction to Pedals: Decentralized AI

In this section, the speaker introduces Pedals, a decentralized method of running and fine-tuning large language models. The speaker explains the significance of this advancement in artificial intelligence and highlights the challenges faced with existing models.

Advancement in Artificial Intelligence

This could be the biggest advancement in artificial intelligence since the Transformers model.

Pedals allows running large language models on any device at fast speeds.

It is based on a technology that has been around for decades.

Large Language Models and Challenges

Large language models, such as Chachi PT, are an incredible step forward in reaching artificial general intelligence.

Chachi PT is centralized and closed source, posing privacy, security, cost, and transparency issues.

Open-source models like Llama Bloom MPT have been released but require modern and expensive hardware to run effectively.

Introducing Pedals - Decentralized AI

Pedals is a decentralized method of running and fine-tuning large language models.

It utilizes a torrent-like network where models are broken down into blocks stored on individual computers worldwide.

Contributions from multiple users create a powerful AI network without requiring massive mainframe computers.

Users can even run Pedals on the free tier of Google Colab.

Benefits and Speeds Achieved by Pedals

Pedals achieves high speeds with different models and server configurations.

It currently achieves five to six tokens per second on the Llama 65 billion parameter model.

This surpasses what consumer graphics cards can directly handle.

Using Pedals as a Client or Server

As a Client

Clients use the network to train or run their models.

No need to worry about server architecture or swarm health.

Running Python code from the Pedals library is all that's required.

As a Server

Servers provide their hardware to help run models on the network.

Setting up a server is simple with just a few lines of Python code.

Private swarms can also be created.

Potential for Distributed Models and Torrent Computing

Distributed models and torrent computing have potential benefits, especially for architectures like the mixture of experts used in GPT E4.

However, it relies on people donating their idle resources, such as GPU time.

Incentivizing Contribution to the Network

One way to incentivize contribution is by rewarding compute power using blockchain technology.

Contributing servers could receive rewards that provide priority or higher usage of the total network.

These rewards could potentially be traded for monetary value.

Current Support and Ease of Use

Pedals currently supports Bloom and Llama models, which are open source.

It is easy to use with just a few lines of code for inference and fine-tuning.

The transcript has been summarized in a clear and concise manner using timestamps when available.

Channel: Matthew Berman

Video description

Update: Sorry for the audio sync issue 😔 In this video, we talk about Petals. A new project combines old-ish technology with large language models to allow you to run even the largest models in a distributed fashion on any device. This incredible new implementation truly decentralizes LLMs (LLaMA, Bloom, MPT, etc) and allows consumer-grade computers to run any large model. Enjoy :) Join My Newsletter for Regular AI Updates 👇🏼 https://forwardfuture.ai/ My Links 🔗 👉🏻 Subscribe: https://www.youtube.com/@matthew_berman 👉🏻 Twitter: https://twitter.com/matthewberman 👉🏻 Discord: https://discord.gg/xxysSXBxFW 👉🏻 Patreon: https://patreon.com/MatthewBerman Media/Sponsorship Inquiries 📈 https://bit.ly/44TC45V Links: Petals - https://petals.dev/ Research - https://research.yandex.com/blog/petals-decentralized-inference-and-finetuning-of-large-language-models Petals Google Colab - https://colab.research.google.com/drive/1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing