Qwen3 is a fantastic open-source model

Qwen3 is a fantastic open-source model

Quen 3: A New Era in Open-Source AI Models

Overview of Quen 3 Model

  • Quen 3 is an open-source model comparable to Gemini 2.5 Pro, featuring a flagship version with 235 billion parameters and 22 billion active parameters.
  • Benchmark comparisons show that while Gemini 2.5 Pro leads slightly in some areas, Quen 3 performs competitively across various metrics, including ELO ratings.

Performance Benchmarks

  • The BFCL benchmark indicates Quen 3's superior function calling ability at a score of 70.8 compared to Gemini's score of 62.9.
  • The model also outperforms previous generations and other competitors like GPT40 in multiple benchmarks, showcasing its advanced capabilities.

Hybrid Thinking Model

  • Quen 3 introduces a hybrid thinking model allowing users to adjust the "thinking budget," enhancing performance based on token usage.
  • In non-thinking mode, the model provides quick responses for simpler tasks; in thinking mode, it takes time for deeper reasoning on complex problems.

Task-Specific Budget Control

  • Users can configure task-specific budgets easily, optimizing the balance between cost-efficiency and inference quality.
  • This flexibility is particularly beneficial for coding tasks where varying levels of thought are required depending on complexity.

Integration with MCP Tools

  • Quen 3 is optimized for use with MCP tools through Zapier’s new service, which connects AI to thousands of applications seamlessly.
  • Users can set up automation without coding skills using Zapier’s platform, making it accessible for various applications.

Model Variants and Specifications

  • The Quen 3 family includes two Mixture of Experts models and six dense models; the flagship has significant parameter counts designed for efficiency.

Introduction to Quen 3 Models

Overview of Model Parameters and Capabilities

  • The Quen 3 models range from 600 million to 32 billion parameters, with varying context windows: 128k for larger models (8B-32B) and 32K for smaller ones (600M-4B).
  • Notably, the model demonstrates tool calling during chain of thought, a feature previously seen in earlier series (03 and 04).

Demonstration of Tool Calling

  • In a demo task, the model fetches GitHub stars and generates a bar chart while seamlessly switching between thinking and tool calls.
  • Another example shows the model organizing desktop files by type through multiple tool calls within a single inference run.

Pre-training Process of Quen 3

Data Collection and Training Tokens

  • Quen 3 was trained on nearly double the tokens compared to its predecessor, using approximately 36 trillion tokens across various languages.
  • The dataset included diverse sources such as web content and PDF-like documents, utilizing previous models for text extraction.

Synthetic Data Generation

  • To enhance math and coding data representation, synthetic data was generated using Quen 2.5 models focused on mathematics and coding tasks.

Training Stages Explained

Three Phases of Pre-training

  • The first stage involved pre-training on over 30 trillion tokens to establish basic language skills.
  • The second phase improved knowledge-intensive data representation through an additional training set of five trillion tokens.

Post-training Methodology

  • A four-stage training pipeline was implemented post-pre-training to develop reasoning abilities alongside rapid response capabilities.

Stages Breakdown:

  1. Long Chain of Thought: Initial training focused on fundamental reasoning across various domains.
  1. Reinforcement Learning: Enhanced exploration-exploitation capabilities through rule-based rewards.
  1. Model Fusion: Integrated non-thinking capabilities via fine-tuning with instruction tuning data.
  1. General Reinforcement Learning: Strengthened general capabilities across numerous domain tasks.

Model Availability and Comparisons

Accessing the Model

  • Users can download the model immediately from platforms like LM Studio or O Lama within MLX.

Benchmark Comparisons with Llama Models

  • Quen 3's flagship model (235B parameters) is compared against Llama 4's Frontier model (402B parameters), noting differences in active parameter usage despite size advantages.

Performance Metrics:

Gemini 2.5 and Its Competitors

Overview of Model Performance

  • Gemini 2.5 is identified as a substantial leader in the AI model landscape, achieving an impressive performance score of 84%.
  • The second place is held by model 03, closely followed by Deepseek R1 and Llama 3.1 Neatron Ultra.
  • Notably, the Nvidia-flavored version of Llama (previous generation) ranks just behind these models.

Analysis of Active Parameters

  • A comparison is made between GPQA Diamond and various models based on their active parameters.
  • Quen 3B, with only 3 billion active parameters, is positioned lower on the performance scale but still shows potential as a reasoning model.
  • The X-axis represents performance quality (left side being better), while the Y-axis indicates GPQA Diamond's effectiveness (higher up being better).
Video description

Try Zapier MCP for free today: https://bit.ly/42Babll Join My Newsletter for Regular AI Updates πŸ‘‡πŸΌ https://forwardfuture.ai My Links πŸ”— πŸ‘‰πŸ» Subscribe: https://www.youtube.com/@matthew_berman πŸ‘‰πŸ» Twitter: https://twitter.com/matthewberman πŸ‘‰πŸ» Discord: https://discord.gg/xxysSXBxFW πŸ‘‰πŸ» Patreon: https://patreon.com/MatthewBerman πŸ‘‰πŸ» Instagram: https://www.instagram.com/matthewberman_ai πŸ‘‰πŸ» Threads: https://www.threads.net/@matthewberman_ai πŸ‘‰πŸ» LinkedIn: https://www.linkedin.com/company/forward-future-ai Media/Sponsorship Inquiries βœ… https://bit.ly/44TC45V Disclosure: I am a small investor in LM Studio. Links: https://x.com/Alibaba_Qwen/status/1916962087676612998 https://qwenlm.github.io/blog/qwen3/ https://x.com/ArtificialAnlys/status/1917246369510879280