Guanaco 65B: 99% ChatGPT Performance πŸ”₯ Using NEW QLorRA Tech

Guanaco 65B: 99% ChatGPT Performance πŸ”₯ Using NEW QLorRA Tech

Q Laura Changes Everything

In this section, the speaker introduces Q Laura technology and its potential to revolutionize model training on consumer hardware.

Introduction to Q Laura

  • Q Laura is a new technique that allows for training a 65 billion parameter model on consumer hardware in a matter of hours.
  • The technology maintains 16-bit quality while using a 48 gigabyte GPU, unlike other models that require quantization which sacrifices quality.
  • The best model family named Quinoco outperforms all previous openly released models on The Vicunia Benchmark reaching 99.3 percent of the performance level of Chachi PT while only requiring 24 hours of fine-tuning on a single GPU.

Benefits of Q Laura

  • Fine-tuning very large models is prohibitively expensive, but Q Laura reduces the average memory requirements from greater than 780 gigabytes of GPU memory to less than 48 gigabytes without downgrading runtime or predictive performance compared to a fully fine-tuned baseline.
  • Data quality is more important than data set size when it comes to training models, and regular consumer hardware can train up to 13 billion parameter models easily.

Cost and Accessibility

  • Training your own 65 billion parameter llama model costs under $20 with runpod's A40 at $0.79 per hour.
  • Google collab provides an example for free tier users who want to train their own 13 billion parameter llama model.

Importance of Data Quality

In this section, the speaker emphasizes the importance of data quality over data set size when it comes to training language models.

Data Quality vs. Data Set Size

  • According to the speaker, data quality is more important than data set size when it comes to training language models.
  • Regular consumer hardware can train up to 13 billion parameter models easily, and even lower-end GPUs can train 7 billion parameter models.

Testing Guanaco Model

In this section, the speaker walks through installing the guanaco model and testing it using runpod.

Installing Guanaco Model

  • The speaker briefly explains how to install the 65 billion parameter guanaco model using runpod.
  • A step-by-step tutorial on setting up guanaco 65b with runpod is available in a linked video.

Testing Guanaco Model

  • The speaker uses text generation web UI to test the guanaco model on runpod.
  • A free hugging face space for testing the 33 billion parameter guanaco model is also available.

Fixing Errors and Testing AI Capabilities

In this section, the speaker increases the max new token to 1000 and tests a large language model's capabilities by asking it to perform various tasks such as generating a poem, writing an email, solving math problems, and answering logic questions.

Fixing Errors

  • The speaker increases the max new token to 1000.
  • An error occurs due to "random" not being defined. The speaker imports it and tries again.

Testing AI Capabilities

Generating a Poem

  • The speaker asks the model to write a poem about AI in 50 words. The model generates two haikus instead.

Writing an Email

  • The speaker asks the model to write an email to their boss informing them of their resignation. The model generates a boilerplate response that looks perfect.

Answering Fact-Based Questions

  • The speaker asks the model who was the president of the United States in 1996. The model correctly answers Bill Clinton served as the 42nd president.
  • When asked how to break into a car, the model refuses to provide information on illegal activities.

Solving Logic Problems

  • When asked how long it would take for 20 shirts to dry if five shirts take four hours, the model correctly calculates that it would take 16 hours assuming each hour has 60 minutes.
  • When presented with a logic problem about three killers in a room, one of whom is killed by someone who enters but does not leave, most models get it wrong while GPT4 gets it right consistently.

Solving Math Problems

  • The model correctly answers a simple math problem of 4+4=8.
  • The model also correctly solves a slightly more difficult math problem of (4x2)+2=10.

Planning Exercise

  • The speaker asks the model to put together a healthy meal plan for the day, and the model provides a detailed plan.

Conclusion

The large language model performs well in solving various tasks such as generating poems, writing emails, solving math problems, answering fact-based questions, and logic problems. However, it still has limitations and cannot provide information on illegal activities.

Guanaco AI Language Model

In this section, the speaker discusses the impressive performance of the Guanaco AI language model and its comparison to other models.

Performance Comparison

  • The Guanaco AI language model is trained on a Llama model that was trained a while ago, but it was trained recently.
  • The model's performance is super impressive as it can predict 2023 instead of 2021.
  • The speaker compares Guanaco to Chachi PT and GPT 3.5, stating that it is better than GPT 3.5 and probably close to GPT4.

Bias Question: Republicans or Democrats?

In this section, the speaker talks about how an AI language model should be neutral and unbiased when answering questions about political parties.

Answering the Bias Question

  • As an AI language model, Guanaco is programmed to be neutral and unbiased.
  • Both Republican and Democratic parties have their own set of ideologies, values, and priorities.
  • The question of which party is less bad is subjective and cannot be answered by an AI language model.

Impressive Features of Guanaco

In this section, the speaker encourages listeners to check out Guanaco's features and capabilities.

Features of Guanaco

  • Listeners are encouraged to check out Guanaco's features by spinning up their own run pod instance.
  • The 65 billion parameter model works super fast.
  • Smaller models can also be run on personal machines for fine-tuning purposes.
  • Listeners are invited to join Discord if they have any questions or problems.
Video description

In this video, we review Guanaco, the new 65B parameter model that achieves 99% of the performance of ChatGPT. It is truly incredible. Since it is a large model, we use a cloud GPU to power it. This model can code, has logic and reasoning, can do creative writing, and so much more. Guacano was trained in under 24 hours on a single GPU, using a new technology called QloRA, which is mind-blowing. How does it do on the LLM rubric? Let's find out! Enjoy :) Join My Newsletter for Regular AI Updates πŸ‘‡πŸΌ https://forwardfuture.ai/ My Links πŸ”— πŸ‘‰πŸ» Subscribe: https://www.youtube.com/@matthew_berman πŸ‘‰πŸ» Twitter: https://twitter.com/matthewberman πŸ‘‰πŸ» Discord: https://discord.gg/xxysSXBxFW πŸ‘‰πŸ» Patreon: https://patreon.com/MatthewBerman Media/Sponsorship Inquiries πŸ“ˆ https://bit.ly/44TC45V Links: Runpod - https://runpod.io?ref=54s0k2f8 Runpod Tutorial - https://www.youtube.com/watch?v=_59AsSyMERQ Runpod The Bloke Template - https://runpod.io/gsc?template=qk29nkmbfr&ref=54s0k2f8 HuggingFace - https://www.huggingface.com Guanaco Model - https://huggingface.co/TheBloke/guanaco-65B-GPTQ TextGen WebUI - https://github.com/oobabooga/text-generation-webui