Opus 4.5 Sees the Whole System

Opus 4.5 Sees the Whole System

AI Model Performance Comparison

Initial AI Experience

  • The speaker reflects on a surprising moment with AI, indicating a significant change in model capabilities.
  • They tested two advanced models, Sonnet 45 and Codex Max High, on an old project but found them ineffective at solving a relatively simple problem.

New Models' Success

  • After struggling with the initial models, the speaker tried Gemini 3 and Opus 45, which successfully solved the problem immediately.
  • These new models not only fixed the original bug but also identified and corrected related issues that were not explicitly mentioned.

Problem Description

  • The issue involved a React application where dragging one card over another caused an error due to state updates triggering re-renders.
  • The speaker created a branch of the application to replicate the problem for testing across different models.

Testing Results

  • Despite using high-performing models like Codex Max High, they failed to resolve the issue after extensive attempts.
  • The speaker shares insights from Opus 45's outputs while attempting to solve their presentation materials.

Understanding the Bug

  • A detailed explanation is provided about how dragging cards creates ghosted representations in the DOM leading to repeated state changes and errors.
  • This cycle of moving cards up and down results in maximum depth exceeded errors as it continuously triggers drag events without resolution.

AI's Limitations and Personal Goals

Understanding AI's Current Challenges

  • The speaker reflects on the limitations of AI, suggesting it is not yet capable of solving certain problems independently.
  • The speaker introduces himself as Matt, expressing his passion for exploring AI tools and understanding their boundaries.

Personal Milestones and Engagement

  • Matt shares a personal goal to reach 25,000 subscribers by December 25th, indicating this target is significant for him.
  • He humorously proposes to dress up in a costume for a celebratory video if he reaches his subscriber goal, showcasing his willingness to engage with the audience.

Technical Insights from Gemini 3 and Opus 45

  • Upon testing Gemini 3 and Opus 45 with the same problem set, both systems successfully solved the issues in one attempt.
  • Matt praises the user interface (UI) generated by Opus 45, noting that it exceeded his expectations despite not being requested.

Identifying Systemic Issues

  • He describes an underlying issue where state changes trigger unnecessary re-renders in the application due to poor design choices.
  • Each card in the UI connects to multiple database listeners; this design leads to performance bottlenecks during user interactions.

Performance Bottlenecks Explained

  • Every minor mouse movement causes re-renders that unsubscribe and resubscribe database listeners, creating excessive calls per second.
  • This rapid cycle of tearing down and recreating cards results in React struggling under high loads due to overwhelming overhead.

Solutions Found by Advanced Models

  • Both Gemini 3 and Opus 45 managed to resolve these systemic issues without experiencing similar performance degradation.
  • Matt expresses curiosity about how these models identified both internal and external problems contributing to system inefficiencies.

Investigative Approach Taken

  • To understand their solutions better, he requests detailed documentation from both models outlining changes made and rationale behind them.

Evaluation of AI Models: Gemini vs. Claude

Insights on Model Performance

  • The speaker discusses an evaluation comparing the performance of two AI models, Gemini and Claude, highlighting stark differences in their problem-solving approaches.
  • Both models excelled at transforming complex documents into understandable reports, aiding users in grasping issues within their code.
  • The speaker emphasizes the importance of requesting visualizations or infographics from these models to better understand problems encountered during coding.

Comparative Analysis of Solutions

  • Claude addressed a specific issue effectively, while Gemini managed to resolve both that issue and an additional one, showcasing its superior capability.
  • A metaphor is used to describe the models' approaches: Claude as an emergency room doctor stabilizing a patient versus Gemini as a surgeon removing internal defects.

Detailed Write-Up Evaluation

  • Opus45's output is praised for its thoroughness in diagnosing errors and tracing execution paths, achieving a 60% solution with Claude compared to 100% with Gemini.
  • The write-up provides diagrams illustrating different modeling approaches and their interactions, making it accessible for broader audiences beyond technical experts.

Understanding Architectural Differences

  • The analysis reveals that while Claude focused on symptoms at the React pattern level, Gemini traced execution paths to identify underlying circular dependencies.
  • This distinction highlights how each model approached the same problem differently; Claude remained fixated on one aspect while Gemini explored broader architectural contexts.

Notable Advancements in AI Capabilities

  • The speaker notes that both Gemini and Opus demonstrated a new ability to consider systemic issues rather than just isolated symptoms—an exciting development in AI capabilities.
  • This shift represents a significant step change in software development tools, moving from basic code copying to more sophisticated solutions capable of understanding entire systems.

Conclusion on Current Developments

  • While acknowledging ongoing challenges in software development, the speaker expresses optimism about recent advancements indicating meaningful progress in AI model performance.
  • Reflecting on past experiences with earlier models like Sonnet 45 versus current iterations like Opus 45 illustrates substantial improvements in reliability and error handling.

Understanding Complex Systems Awareness

The Challenge of System Awareness in Enterprises

  • The speaker discusses the limitations of current systems in understanding complex, nested structures within enterprises, highlighting a common issue faced by professionals in such environments.
  • Emphasis is placed on the necessity for these systems to focus on individual moving parts rather than attempting to grasp the entirety of the system at once.
  • The speaker suggests that we are witnessing the early stages of development in this area, indicating potential future advancements.
  • There is an acknowledgment that while progress is being made, it remains at a nascent stage and requires further exploration and refinement.
Video description

Sonnet 4.5 and Codex Max High both failed on a React bug. For hours. Death spirals, circular fixes, no progress. Then Opus 4.5 and Gemini 3 dropped—and solved it in one shot. But here's what got me: they didn't just fix the bug I showed them. They found a second architectural problem I never mentioned—a Firestore listener loop wrapping the whole system. They saw the forest, not just the trees. This video breaks down exactly what happened: the bug, why two state-of-the-art models couldn't escape their local fix patterns, and why I think Opus 4.5 and Gemini 3 represent the first glimpse of system-level AI coding. I walk through the actual error, show diagrams of both the inner loop (React re-renders) and the outer loop (Firestore subscriptions), and compare the diagnostic write-ups each model produced. One model acted like an ER doctor—stabilizing symptoms. The other acted like a surgeon—finding the root cause. This isn't AGI. But it is a step change. And if you've ever hit a wall where AI keeps fixing the same thing over and over without solving your actual problem, this might explain why—and what's different now. LINKS Claude Opus 4.5: https://anthropic.com/claude Gemini 3: https://deepmind.google/technologies/gemini/ Claude Code CLI: https://docs.anthropic.com/en/docs/claude-code #Opus45 #Gemini3 #AICoding #ClaudeAI #SystemLevelAI 00:00 - Intro 01:42 - The Problem 02:18 - Asking for Sonnet's help 03:09 - What was Sonnet doing? 05:24 - Sonnet's fix 06:33 - Make me dress up 07:54 - Give it to the new guys! 08:22 - The outer problem 11:03 - Opus saw both? 12:15 - Visualize your reports! 13:49 - What was the difference? 16:14 - Conclusion