Gemini 3 isn't the answer. How to Solve 1 Million Steps with 0 Errors

Gemini 3 isn't the answer. How to Solve 1 Million Steps with 0 Errors

Solving a Million-Step LLM Task with Zero Errors

Introduction to the Paper

  • A new paper titled "Solving a Million-Step LLM Task with Zero Errors" was published in November 2025 by Cognizant AI lab, addressing a major failure mode in the AI industry.
  • Current AI agents can perform tasks like writing code or planning trips but struggle with long, complex tasks, often leading to failures such as drifting and hallucination.

Rethinking Model Limitations

  • The authors argue that the problem is not merely about model capabilities or context window limitations; rather, it’s an engineering architecture issue.
  • They achieved success without using advanced models or large context windows through their framework called Maker (Massively Decomposed Agentic Processes).

Understanding Probability and Task Complexity

  • The paper highlights how probability affects task success rates: a model with 99% accuracy drops significantly when tasked with multiple steps.
  • For example, solving real-world tasks often requires thousands of steps, making high accuracy crucial for success.

Benchmarking Against Tower of Hanoi

  • Researchers used the Tower of Hanoi puzzle as a benchmark, which requires 148,575 moves for 20 discs. Standard models failed due to context drift.
  • Context drift occurs when models become confused by their own past outputs as conversation history grows.

The Maker Framework Explained

Pillar One: Maximal Decomposition

  • Maker's first pillar involves treating each step as an isolated problem without retaining past actions. This prevents confusion from previous steps.

Pillar Two: Red Flagging

  • The second pillar focuses on identifying potential logic errors through syntax errors. If output deviates from expected formats, it triggers a retry instead of attempting repairs.

Pillar Three: K Voting Mechanism

  • The third pillar employs a voting mechanism where multiple answers are generated for each step. Even less accurate models can achieve high reliability through this method.

Economic Implications of the Findings

  • The research reveals that smaller models combined with voting mechanisms can be more cost-effective than larger models for complex tasks.
  • It suggests that simpler models performing single logical steps are sufficient and cheaper than relying on high-end models for every task.

Conclusion and Future Directions

  • While the findings present significant advancements in AI task execution reliability and cost-effectiveness, they also open avenues for further exploration into architectural frameworks that enhance performance across various applications.

Understanding Software Development Strategies

Importance of November 2025 for Developers

  • The date serves as a pivotal reference point, providing a blueprint for current software development practices.
  • Developers are encouraged to stop relying on chat history for state management and instead define their atomic state clearly.

Defining Atomic State in Development

  • For coding tasks, the atomic state is represented by the file system and compiler error logs.
  • In data analysis, the atomic state corresponds to the data frame being utilized.

Task Decomposition Techniques

  • Developers should break down complex tasks into micro-level components rather than asking an agent to perform large functions.
  • Suggested breakdown includes having separate agents for defining inputs, writing function signatures, and implementing specific logic (e.g., tax brackets).

Implementing Voting Mechanisms

  • Critical decision points should involve voting mechanisms; not every step requires this process.
  • For significant decisions where errors could disrupt processes, five parallel calls can be initiated. Disagreement among these signals uncertainty in the model's output.

Reliability as an Engineering Challenge

  • The discussion emphasizes that reliability issues are engineering problems that can be addressed now without waiting for model companies to resolve hallucinations.
  • By treating LLMs (Large Language Models) as stochastic components needing redundancy and strict input verification, developers can create more reliable systems compared to existing models.
Video description

Paper: https://arxiv.org/abs/2511.09030 A revolutionary paper just dropped (November 2025) that changes everything we know about building AI Agents. We’ve been obsessed with bigger context windows and smarter models, but the "MAKER Framework" proves we were wrong. This breakdown explains how researchers achieved 1,000,000 logical steps with ZERO errors using "dumb" models and a brilliant new architecture. In this video: Why your agents fail at long tasks (The Math of Failure). The "MAKER" Framework explained visually. Why "Stateless" agents beat long-context models. The Scaling Law: Why Small Models + Voting is cheaper than GPT-4. #AI #LLM #MachineLearning #Agents #SoftwareArchitecture #gemini-3