OpenAI Dropped a FRONTIER Open-Source Model

OpenAI Dropped a FRONTIER Open-Source Model

OpenAI Releases GPTOSS: A Game-Changer in Open-Source AI

Overview of GPTOSS

  • OpenAI has launched GPTOSS, a state-of-the-art open-source model, which may be linked to the previously mentioned Horizon Alpha.
  • The model is available in two sizes: 120 billion parameters and 20 billion parameters, both classified as open-weight language models.

Benefits of Open Source

  • Open-source models are significantly cheaper than closed-source alternatives and allow for customization through fine-tuning.
  • Released under an Apache 2.0 license, these models provide a permissive framework for use and modification.

Performance Insights

  • The 12B version of GPTOSS performs comparably to OpenAI's 04 Mini on reasoning benchmarks while being efficient enough to run on consumer hardware like an 80 GB GPU.
  • The 20 billion parameter version can operate on edge devices with just 16 GB of memory, making it suitable for local inference.

Practical Applications

  • Users are encouraged to download the model for offline access to knowledge, ensuring availability during internet outages or emergencies.
  • Both models excel in tool use, function calling, chain-of-thought reasoning, and health diagnostics.

Unique Features

  • Users can adjust the reasoning depth during problem-solving tasks based on complexity requirements.
  • Trained using advanced techniques focused on efficiency and usability across various deployment environments.

Technical Specifications

  • Each model employs a transformer architecture with a mixture of experts approach that optimizes active parameters per token processed.
  • The larger model activates only 5 billion parameters per token while maintaining high efficiency; the smaller version activates 3.6 billion.

Training Methodology

  • Models utilize alternating dense and sparse attention patterns similar to GPT3 for improved inference efficiency.
  • They were trained on high-quality text datasets emphasizing STEM fields and general knowledge using an enhanced tokenizer from previous models.

Post-training Techniques

Performance Benchmarks of AI Models

Benchmarking Results

  • The 120 billion parameter version with tools scored 2622 in a coding competition, closely trailing the frontier model (03) which scored 2706. This indicates strong performance across different model sizes.
  • In expert-level coding questions, the 120 billion parameter version achieved a score of 19%, while the frontier model (03) reached 24.9%. Notably, an open-source 12 billion version outperformed both the mini versions of models 04 and 03 without tools.
  • For medical benchmarks like Healthbench, the scores were comparable: the 120B model scored 57.6 against the frontier's score of 59.8. Even smaller models showed impressive results, such as a score of 96% on Amy 2024 by the 20 billion parameter version.
  • The GPQA Diamond benchmark for PhD-level science yielded scores of 80.1 for the 120B model and slightly lower at 71.5 for the smaller variant, indicating robust capabilities even in advanced academic contexts.
  • MMLU results showed high accuracy with scores reaching up to 90% for larger models and respectable performances from smaller ones, reinforcing their competitive edge.

Safety Considerations

  • Monitoring a reasoning model's chain of thought can help detect misbehavior; however, direct supervision was avoided during training to maintain authenticity in responses.
  • Developers are advised against displaying raw chains of thought to users due to potential hallucinations or harmful content; instead, summarization and filtering are recommended practices.
  • Pre-training involved filtering out harmful data related to sensitive topics like chemical and biological threats; however, concerns remain about adversaries fine-tuning open-source models for malicious purposes.
  • Testing indicated that even after extensive fine-tuning aimed at malicious use cases (e.g., biological weapons), these models did not achieve high capability levels according to OpenAI’s preparedness framework.
  • A challenge is being hosted for red teamers with a $500,000 reward aimed at identifying safety issues within these AI models through expert reviews.
Video description

Try gpt-oss, the best open-source coding model, on Together AI - the platform for AI-engineers and production-ready inference. Visit: http://bit.ly/41tgRka and start building today!. Download The Matthew Berman Vibe Coding Playbook (free) πŸ‘‡πŸΌ https://bit.ly/3I2J0YQ Download Humanities Last Prompt Engineering Guide (free) πŸ‘‡πŸΌ https://bit.ly/4kFhajz Join My Newsletter for Regular AI Updates πŸ‘‡πŸΌ https://forwardfuture.ai Discover The Best AI ToolsπŸ‘‡πŸΌ https://tools.forwardfuture.ai My Links πŸ”— πŸ‘‰πŸ» X: https://x.com/matthewberman πŸ‘‰πŸ» Instagram: https://www.instagram.com/matthewberman_ai πŸ‘‰πŸ» Discord: https://discord.gg/xxysSXBxFW Media/Sponsorship Inquiries βœ… https://bit.ly/44TC45V Disclaimer: I am a small investor in CrewAI and LMStudio. Links: https://openai.com/index/introducing-gpt-oss/