The Industry Reacts to GPT-5

The Industry Reacts to GPT-5

Reactions to the GPT-5 Launch

Overview of Industry Reactions

  • The launch of GPT-5 has sparked polarized opinions, with some users praising it as the best model while others prefer Claude 3.5 or express skepticism about its evaluations.

Feedback from Sam Altman

  • Sam Altman acknowledges that OpenAI underestimated user attachment to GPT-4, noting that even if GPT-5 performs better, many users are upset about retiring GPT-4.

User Preferences and Customization

  • Users have varying opinions on the strengths of GPT-4 versus GPT-5; there is a call for better customization options to cater to different user needs.

Model Rollout and Personality Adjustments

  • OpenAI plans to stabilize the rollout of GPT-5 and make adjustments to enhance its personality, which differs significantly from that of GPT-4.

Independent Benchmarks and Performance Metrics

Benchmarking Insights

  • Artificial Analysis conducted independent benchmarks on GPT-5 across eight evaluation configurations, revealing significant performance insights.

Reasoning Effort Configurations

  • GPT-5 offers four reasoning effort configurations (high, medium, low, minimal), allowing users to adjust how much cognitive effort the model expends per query.

Token Usage Efficiency

  • High reasoning effort uses more tokens than previous models but remains efficient compared to competitors like Gemini 2.5 Pro; minimal reasoning effort shows substantial token efficiency improvements.

Long Context Reasoning and Agentic Capabilities

Long Context Reasoning Performance

  • A new benchmark indicates that GPT-5 excels in long context reasoning tasks, crucial for applications involving extensive codebases where referencing multiple sections is necessary.

Enhancements in Agentic Capabilities

  • OpenAI has improved agentic capabilities by adding features like instruction following and personality assessments through micro-evaluations on their platform.

AI Intelligence Index Rankings

Current Rankings Overview

  • According to artificial analysis's index, GPT-5 ranks highest among various models with scores indicating superior intelligence levels compared to predecessors like Gro 4 and earlier versions of itself.

GraphGate Controversy

Issues with Presentation Graphics

  • There was controversy surrounding inaccuracies in graphs presented during the live stream; discrepancies were noted between reported scores and visual representations.

Human Error Acknowledgment

  • The speaker emphasizes that mistakes can happen during presentations due to human error, similar to how AI models can produce hallucinations.

Introduction of Chat LLM by Abacus AI

Features of Chat LLM

Introduction to AI Models and Deep Agent

Overview of New AI Capabilities

  • The introduction of text-to-video models allows for easy generation of images and videos.
  • Abacus AI has launched "Deep Agent," a versatile AI capable of tasks such as website creation, app development, presentations, research reports, chatbots, and game building.
  • Deep Agent integrates 6 to 10 different LLMs (Large Language Models), including open-source models like Coin Coder.

Chat LLM Features

  • Chat LLM includes the latest frontier models: Opus 4.1, GPTO OSS120B, and GPT5 available at a subscription cost of $10 per month.
  • Users are encouraged to check out the service via chatlm.abacus.ai or through a provided link.

GPT5 Performance Evaluation

Benchmarking Results

  • GPT5 is reported as the top model across various categories in LM Arena evaluations.
  • It ranks first in text processing, web development, vision tasks, coding challenges, math problems, creativity assessments, and handling long queries with an ELO score of 1481.

Comparison with Other Models

  • Gemini 2.5 Pro follows in second place with a score of 1460; Gro 4 ranks lower at fifth place.
  • The speaker emphasizes that traditional benchmarks may not hold significant value anymore.

Post-Evaluation Insights on GPT5

User Experience Over Benchmarks

  • The speaker argues that user experience—how well the model follows instructions and handles context—is more important than benchmark scores.
  • There’s skepticism about the relevance of intelligence benchmarks once they reach saturation points; practical usability becomes paramount.

Community Reactions

  • A contrasting opinion from Theo GG suggests that while he finds GPT5 impressive for its capabilities, others perceive it as underwhelming or flawed.

Critiques and Alternative Perspectives on GPT5

Performance Concerns

  • Stage Hand claims that GPT5 performs worse than other models like Opus 4.1 regarding speed and accuracy.
  • Smaller models reportedly outperform GPT5 in speed but still lag behind Opus 4.1 in accuracy metrics.

User Reviews

  • McKay Wrigley describes GPT5 as excellent for everyday chats but prefers using Cloud Code plus Opus for coding tasks due to performance nuances.

Model Router Functionality Discussion

Model Routing Mechanism

  • The hybrid model router introduced with GPT5 directs users to the most suitable version based on their prompts and use cases.

Personal Preferences

  • Some users express dissatisfaction with this routing feature while others appreciate its efficiency in providing quick answers when needed.

Jailbreaking Attempts on AI Models

Security Challenges

GPT5 and Its Implications in AI Development

Overview of GPT5's Capabilities

  • An intern at LM Arena demonstrated GPT5's ability to create a simplistic Minecraft clone in one attempt, showcasing its efficiency.
  • Boris critiques that while GPT5 is not AGI, OpenAI is focusing on broadening product appeal rather than major innovations, likening it to Apple's approach.

Competitive Landscape

  • Tony Woo from XAI expresses pride in their performance against GPT5 with Gro 4, claiming it outperforms GPT5 in several benchmarks like ARC AGI.
  • Woo emphasizes the rapid development pace at XAI and hints at upcoming models that could challenge existing leaders like OpenAI.

Pricing Strategies

  • Simon Willis highlights the pricing differences: Claude Opus 4.1 costs $15 per million input/output, while Gro 4 is significantly cheaper at $3 per million input and $15 per million output.
  • GPT5 offers an even more competitive price of $1.25 per million input and $10 per million output, which could enhance user adoption due to affordability.

Performance Comparisons

  • A comparison between Kua GPT40 and GPT5 shows that while GPT40 struggles with computer tasks, GPT5 excels by passing all tests effectively.
  • Aiden McLofflin from OpenAI claims that GPT5 surpasses competitors like Claude 4.1 in evaluations and is over five times cheaper than Opus.

User Experiences and Limitations

  • Vos from Meta humorously notes that although GPT5 refactored his codebase impressively, none of the changes worked as intended.
  • Sophie Netcap discusses how users are increasingly consulting language models for medical advice before or after visiting doctors, indicating growing trust in AI for health-related queries.

Societal Implications of AGI

  • Carl Yang reflects on a prevalent belief in Silicon Valley regarding AGI leading to a permanent underclass based on capital access for compute resources.
  • He shares insights into the urgency felt by some individuals to accumulate wealth before AGI becomes mainstream, highlighting differing perspectives on future societal structures.

Transitioning Between Models

Discussion on GPT-5 and AI Model Comparisons

Initial Reactions to GPT-5

  • Dylan Patel, CEO of Semi Analysis, expresses disappointment with GPT-5, stating it is "disappointing" without further elaboration.

User Perspectives on AI Models

  • A comment from Santiago suggests Claude's performance surpasses that of GPT models, indicating ongoing preference for Claude 3.5 even after the release of newer versions like GPT-5 and Claude 4.

Critique of Incremental Improvements

  • Amjad Msad, CEO of Replit, comments on the diminishing returns in AI model advancements, suggesting that improvements are becoming less significant and implying that GPT-5 may not meet expectations.

The Need for Structural Development in AI

  • The discussion emphasizes the importance of building infrastructure around AI models. It compares raw intelligence (the model's capabilities) to a powerful engine needing a car (scaffolding) to be effective.

Performance Benchmarks and Competition

Video description

Cancel your AI subscriptions and try this All-in-One AI Super assistant that's 10x better: https://chatllm.abacus.ai/ffb Try this God Tier AI Agent that literally does everything: https://deepagent.abacus.ai/ffb Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼 https://bit.ly/3I2J0YQ Download Humanities Last Prompt Engineering Guide (free) 👇🏼 https://bit.ly/4kFhajz Join My Newsletter for Regular AI Updates 👇🏼 https://forwardfuture.ai Discover The Best AI Tools👇🏼 https://tools.forwardfuture.ai My Links 🔗 👉🏻 X: https://x.com/matthewberman 👉🏻 Instagram: https://www.instagram.com/matthewberman_ai 👉🏻 Discord: https://discord.gg/xxysSXBxFW Media/Sponsorship Inquiries ✅ https://bit.ly/44TC45V Links: https://x.com/ArtificialAnlys/status/1953507703105757293 https://x.com/ArtificialAnlys/status/1953523986526351576?t=uT_YEZb5P8kAfcygQ8jVZA&s=19 https://x.com/willccbb/status/1953503727517938135?t=t-geHaj3Hh09pCEMX8h6MQ&s=19 https://x.com/theo/status/1953514692439347419?t=Xn_Q-V81kmWnW8amxn6ZSA&s=19 https://x.com/lmarena_ai/status/1953504958378356941?t=BIaUPSF_ZVgXiYlkTp_Abw&s=19 https://x.com/elder_plinius/status/1953548090117665016?s=46 https://x.com/sama/status/1953530707269366234 https://x.com/kitlangton/status/1953503594650521992 https://x.com/sama/status/1953529799219319205 https://x.com/ArtificialAnlys/status/1953545346480845280 https://x.com/Stagehanddev/status/1953575671491706904 https://x.com/mckaywrigley/status/1953575746901094499 https://x.com/cdngdev/status/1953505957239263293 https://x.com/_Borriss_/status/1953509038618009988 https://x.com/Yuhu_ai_/status/1953551132921671712 https://simonwillison.net/2025/Aug/7/gpt-5/ https://x.com/trycua/status/1953583236501631084 https://x.com/aidan_mclau/status/1953511599366455687?t=t3nJHE6FgcmDieImpbzcyg&s=19 https://x.com/vasumanmoza/status/1953531950137815374?t=Jets2YFQT1gZpHKYiT8Riw&s=19 https://x.com/netcapgirl/status/1953526050572452109?t=R_CsMYMjUgCozEZjLTn3CQ&s=19 https://x.com/chiefofstuffs/status/1953536120282911020?t=vQjNuMG59rQZxtjTxRtfpQ&s=19 https://x.com/xeophon_/status/1953504619188908074?t=0vReUIv-QNWkODzueyC8bA&s=19 https://x.com/dylan522p/status/1953646457908547979?t=PT4cPUXS_kBkiei-VN6lYw&s=19 https://x.com/amasad/status/1953697015311085636?t=GfdR1ASwLLNvHUbcQJytHg&s=19 https://x.com/Andr3jH/status/1953522044156428660?t=vSTC6t8t6UHxqscCnrYbjg&s=19 https://x.com/elonmusk/status/1953532769881272548?t=NE7CgKH2206grJifabpLVQ&s=19