Anthropic Just Killed Tool Calling

Anthropic Just Killed Tool Calling

Entropic's New Developer Tools: Programmatic Tool Calling

Introduction to Programmatic Tool Calling

Entropic has introduced significant developer tools with the release of set 46, focusing on programmatic tool calling.

This feature allows agents to make specific calls to tools via code instead of loading everything into the context window, saving tokens and improving accuracy.

Advantages Over Traditional Methods

The engineering work by Anthropic highlights their commitment to innovation in tool calling, which is more effective than traditional JSON structures.

LLMs (Large Language Models) are better suited for code execution rather than conventional tool calling due to their training background.

Industry Impact and Adoption Trends

Anthropic's innovations often lead to industry-wide adoption, as seen with MCPS (Model Context Pro Protocol).

The context window problem is exacerbated by protocols like MCP, leading to inefficient use of space during user interactions.

Context Engineering and Its Importance

Context engineering aims to optimize what information is loaded into the context window, discarding unnecessary data.

Tool calls significantly contribute to context pollution; thus, optimizing them can enhance performance.

How Programmatic Tool Calling Works

In programmatic tool calling, coding agents write code in a sandbox environment for invoking tools rather than making direct calls.

This method reduces token usage since only final outputs are returned from the sandboxed process.

Timeline of Developments in Programmatic Tool Calling

Cloudflare published a report in September 2025 advocating for programmatically invoking tools within an MCP server, showing potential token savings of 30%–80%.

Anthropic echoed these findings in November 2025 with their article on building efficient agents using MCP.

Recent Advancements and Community Response

Anthropic released advanced tools including a search function that optimizes token usage further.

The open-source community rapidly adopted these concepts, leading to implementations across various platforms like Blocks Goose Agent and Light LLM.

Conclusion on Current State and Future Directions

As of now, these advancements have moved beyond beta testing into full support with dynamic filtering capabilities for web searches.

5.2 API Enhancements and Tool Support

Key Insights on LLMs and Code Generation

Version 5.2 has introduced support for over 20 different tools via their API, enhancing the capabilities of large language models (LLMs).

LLMs are trained on billions of lines of code, particularly effective in generating and understanding code but lacking in synthetic JSON tool calling formats.

Anthropic's Sonnet 46 release includes two new tools: web search and dynamic filtering, which improve how agents interact with data.

Improvements in Web Search Capabilities

The new features allow the model to write and execute code during web searches, filtering results before they enter the context window to enhance accuracy.

Initial tests showed an average improvement of 11% in performance metrics while reducing input tokens by 24%, indicating significant efficiency gains.

Benchmark Performance Analysis

The browser comp benchmark assesses an agent's ability to navigate websites for hard-to-find information; Sonnet improved from 33% to 46%, while OPUS increased from 45% to 61%.

In the deep search QA benchmark, which evaluates finding multiple correct answers through web searches, Sonnet saw an F1 score rise from 52% to 59%.

Token Cost Considerations

Token costs can vary based on how much code is generated for filtering; Sonnet's price-weighted token decreased while OPUS's increased due to more extensive coding requirements.

This indicates that a reduction in output tokens does not always correlate with lower token costs; careful consideration is needed when evaluating performance.

Utilizing New Features Effectively

Users employing the search API need only enable data fetching; Anthropic will automatically optimize token usage by returning only relevant information.

Additional tools have exited beta status, including code execution sandboxes and programmatic tool calling, along with detailed documentation provided for user guidance.

Implementation Structure

To implement these tools effectively, users must define what each tool does alongside its input/output schema within a structured format.

Instead of traditional function calls, models like Cloud will now generate code directly for executing specific tasks as part of standard industry practices.