NEW "Orca" š³ Open-Source Model Surprised Everyone.
Battle between Large Foundational Models and Smaller Open Source Models
In this section, the speaker introduces the battle between large foundational models and smaller open source models in the world of artificial intelligence.
Open Source Models Iterating Quickly
- A leaked internal memo from Google called "We Have No Mode" highlights how open source models are iterating so quickly that large foundational models like those from Google and OpenAI are at risk.
- Any developer can get their hands on these models, and new techniques to train and fine-tune them are coming out every day.
Challenges to Value of Open Source Models
- A research paper released a couple of weeks ago claimed to disprove a lot of the value that these open source smaller models have.
- The paper "The False Promise of Imitating Proprietary LLMs" challenges the assertion that open source models can truly understand the logic to reach certain outputs.
Orca Paper
- Microsoft Research released a new research paper called "Orca - Progressive Learning from Complex Explanation Traces of GPT-4".
- Orca is a technique that makes open source smaller models extremely powerful by challenging the idea that they can only imitate answers.
Enhancing Capabilities of Smaller Models through Imitation Learning
This section discusses recent research focused on enhancing the capability of smaller models through imitation learning drawing on the outputs generated by large foundational models.
Limitations of Imitation Techniques
- Recent research has focused on enhancing the capability of smaller models through imitation learning drawing on the outputs generated by large foundational models.
- The paper "Orca - Progressive Learning from Complex Explanation Traces of GPT-4" outlines the limitations of these imitation techniques, including limited imitation signals and a lack of rigorous evaluation resulting in overestimating the small model's capability.
- Open source models tend to learn to imitate the style but not the reasoning process of large foundational models.
Orca Outperforms Other Open Source Models
- Orca outperforms every other open source model and even outperforms Chat GPT which is GPT 3.5 in a lot of different benchmarks.
- Orca challenges the idea that open source models can only really imitate answers and will get thrown off by any variation in the prompts themselves.
Orca Model: Learning from Rich Signals
In this section, the speaker introduces the Orca model and explains how it learns from rich signals from GPT4, including step-by-step thought processes and other complex instructions.
Orca Model
- The Orca model is a language model with only 13 billion parameters that can run on any modern hardware.
- It learns from rich signals from GPT4, including explanation traces, step-by-step thought processes, and other complex instructions.
- Orca surpasses conventional state-of-the-art instruction-tuned models such as Vicuna 13B by more than 100% in complex zero-shot reasoning benchmarks like Big Bench Hard and AGI Eval Big Bench Hard.
Guided by Teacher Assistance
- The key technique used by the Orca model is guided by teacher assistance from ChatGPT.
- ChatGPT has a two-tier teaching process where they take GPT3.5 and have a large number of examples to learn from (5 million).
- They boil down those 5 million examples to the most important one million examples and then use GPT4 to continue training on more complex examples.
Learning from Step-by-Step Explanations
- Rather than learning from prompt-response pairs, the foundational models are asked to explain their reasoning step-by-step.
- This technique is called explanation tuning where it's not just the prompt and answer but an explanation of the reasoning and logic for how Chachi PT and GPT4 arrived at an answer.
- Our research indicates that learning from step-by-step explanations is a promising direction to improve model capabilities.
Comparison of Orca Model with Other Models
In this section, the speaker compares the performance of different models in various benchmarks.
Auto Evaluation
- When evaluated by GPT4 using auto-evaluation, Orca 13B beats Chachi BT, Bard, and the open-source models based on Llama.
- For zero-shot problems on academic exams, Chachi PT performs better than Orca 13B.
Big Bench Hard
- Orca 13B achieves parody with Chachi PT in complex zero-shot reasoning tasks in Big Bench Hard work.
- Imitation paper authors assert that model limitation is a false promise since broadly matching ChatGPT using purely imitation would require one a concerted effort to collect enormous imitation data sets and far more diverse and higher quality imitation data than is currently available.
Open Source Models
- Alpaca and Wizard LM employ a variant of self-instruct where Wizard LM introduces the concept of Evol and struck which gradually rewrites the initial set of instructions into more complex versions attempting to overcome some of the methods inherent shortcomings.
- With Vicuna and Koala, they demonstrate remarkable performance due to the more human-like conversations and natural instructions in the community-contributed conversations like those in shared GPT.
Key Contributions of Orca Model
In this section, the speaker discusses three key contributions of the Orca model.
Explanation Tuning
- The first contribution is explanation tuning where models are fine-tuned based on step-by-step explanations of reasoning and logic for arriving at solutions.
Teacher Assistance
- The second contribution is guided by teacher assistance from ChatGPT where they have a two-tier teaching process to train models on more complex examples.
Promising Direction
- The third contribution is that learning from step-by-step explanations is a promising direction to improve model capabilities.
Explanation Tuning
The speaker discusses how Orca excels in utilizing a large dataset of tasks and instructions to train GPT4. They use explanation tuning to force GPT4 to put forth its reasoning and logic in the response itself, which is used for training.
Scaling Tasks and Instructions
- Open source models use a highly limited dataset, but Orca utilizes the Flan 2020 collection, which has tens of millions of instructions.
- Orca has 5 million tasks and instructions, many times more than all other open-source models.
- GPT4 provides step-by-step explanations on how it figures out the median using data from the system instruction tool.
Evaluation Techniques
- Orca uses auto-evaluation with GPT4 by asking it between two potential responses which one is best.
- They also use academic benchmarks like Big Bench Hard and Truthful QA as well as professional and academic exams like SAT LSAT Etc.
- Safety evaluation from toxic gen based on whether responses contain toxic language is also used.
System Messages
The speaker explains that system messages are the main tool used to get chat GBT and GPT4 to provide step-by-step explanations.
Prompting Techniques
- Modern prompting techniques like Chain of Thought are used to coax chat GPT4 into explaining its reasoning.
- System messages are required for using Chachi PT playground or API.
- Chachi PT breaks down the differences between two sentences and determines which one is not logical.
GPT4's Answer
- GPT4 provides a much more detailed and verbose answer than a typical response.
Progressive Learning with Chachi PT
The speaker discusses the concept of progressive learning or curriculum learning, where a student learns from easier examples followed by harder ones. They explain how this technique can be used to teach AI models like GPT to get to the GPT4 level by using an intermediate step of Chachi PT.
- Using Chachi PT as an intermediate step helps AI models perform much better.
- Orca performs significantly better than Vacunya on tasks like LSAT and SAT.
- Orca performs similarly to Chachi PT but lags behind GPT4.
- Orica performs substantially better than Bacuna and even across all tasks, it performs better than Chachi PT.
Cost and Time Efficiency of Chat GPT 3.5 Turbo
The speaker explains why they use Chat GPT 3.5 Turbo instead of GPT4 for cost and time efficiency reasons.
- Chat GPT 3.5 Turbo is faster and less expensive than GPT4.
- They use 5 million examples with Chachi BT and 1 million examples for GPT4.
Open Source Models Continue to Improve
The speaker finds it fascinating that open source models continue to improve at a rapid clip due to new techniques for fine-tuning training coming out every day. They also discuss how Microsoft Research's gains in open source are awesome since they own a significant portion of OpenAI.
- Open source models continue to get better and cheaper over time.
- GPT4 still seems to have some secret sauce and performs much better than any other model.
- Microsoft Research's gains in open source are awesome since they own a significant portion of OpenAI.