Name: Gemini 3 isn't the answer. How to Solve 1 Million Steps with 0 Errors
Uploaded: 2025-11-20T02:25:34.000Z
Duration: 16 min 19 s

Gemini 3 isn't the answer. How to Solve 1 Million Steps with 0 Errors

Solving a Million-Step LLM Task with Zero Errors

Introduction to the Paper

A new paper titled "Solving a Million-Step LLM Task with Zero Errors" was published in November 2025 by Cognizant AI lab, addressing a major failure mode in the AI industry.

Current AI agents can perform tasks like writing code or planning trips but struggle with long, complex tasks, often leading to failures such as drifting and hallucination.

Rethinking Model Limitations

The authors argue that the problem is not merely about model capabilities or context window limitations; rather, it’s an engineering architecture issue.

They achieved success without using advanced models or large context windows through their framework called Maker (Massively Decomposed Agentic Processes).

Understanding Probability and Task Complexity

The paper highlights how probability affects task success rates: a model with 99% accuracy drops significantly when tasked with multiple steps.

For example, solving real-world tasks often requires thousands of steps, making high accuracy crucial for success.

Benchmarking Against Tower of Hanoi

Researchers used the Tower of Hanoi puzzle as a benchmark, which requires 148,575 moves for 20 discs. Standard models failed due to context drift.

Context drift occurs when models become confused by their own past outputs as conversation history grows.

The Maker Framework Explained

Pillar One: Maximal Decomposition

Maker's first pillar involves treating each step as an isolated problem without retaining past actions. This prevents confusion from previous steps.

Pillar Two: Red Flagging

The second pillar focuses on identifying potential logic errors through syntax errors. If output deviates from expected formats, it triggers a retry instead of attempting repairs.

Pillar Three: K Voting Mechanism

The third pillar employs a voting mechanism where multiple answers are generated for each step. Even less accurate models can achieve high reliability through this method.

Economic Implications of the Findings

The research reveals that smaller models combined with voting mechanisms can be more cost-effective than larger models for complex tasks.

It suggests that simpler models performing single logical steps are sufficient and cheaper than relying on high-end models for every task.

Conclusion and Future Directions

While the findings present significant advancements in AI task execution reliability and cost-effectiveness, they also open avenues for further exploration into architectural frameworks that enhance performance across various applications.

Understanding Software Development Strategies

Importance of November 2025 for Developers

The date serves as a pivotal reference point, providing a blueprint for current software development practices.

Developers are encouraged to stop relying on chat history for state management and instead define their atomic state clearly.

Defining Atomic State in Development

For coding tasks, the atomic state is represented by the file system and compiler error logs.

In data analysis, the atomic state corresponds to the data frame being utilized.

Task Decomposition Techniques

Developers should break down complex tasks into micro-level components rather than asking an agent to perform large functions.

Suggested breakdown includes having separate agents for defining inputs, writing function signatures, and implementing specific logic (e.g., tax brackets).

Implementing Voting Mechanisms

Critical decision points should involve voting mechanisms; not every step requires this process.

For significant decisions where errors could disrupt processes, five parallel calls can be initiated. Disagreement among these signals uncertainty in the model's output.

Reliability as an Engineering Challenge

The discussion emphasizes that reliability issues are engineering problems that can be addressed now without waiting for model companies to resolve hallucinations.

By treating LLMs (Large Language Models) as stochastic components needing redundancy and strict input verification, developers can create more reliable systems compared to existing models.