Anthropic Just Killed Tool Calling
Entropic's New Developer Tools: Programmatic Tool Calling
Introduction to Programmatic Tool Calling
- Entropic has introduced significant developer tools with the release of set 46, focusing on programmatic tool calling.
- This feature allows agents to make specific calls to tools via code instead of loading everything into the context window, saving tokens and improving accuracy.
Advantages Over Traditional Methods
- The engineering work by Anthropic highlights their commitment to innovation in tool calling, which is more effective than traditional JSON structures.
- LLMs (Large Language Models) are better suited for code execution rather than conventional tool calling due to their training background.
Industry Impact and Adoption Trends
- Anthropic's innovations often lead to industry-wide adoption, as seen with MCPS (Model Context Pro Protocol).
- The context window problem is exacerbated by protocols like MCP, leading to inefficient use of space during user interactions.
Context Engineering and Its Importance
- Context engineering aims to optimize what information is loaded into the context window, discarding unnecessary data.
- Tool calls significantly contribute to context pollution; thus, optimizing them can enhance performance.
How Programmatic Tool Calling Works
- In programmatic tool calling, coding agents write code in a sandbox environment for invoking tools rather than making direct calls.
- This method reduces token usage since only final outputs are returned from the sandboxed process.
Timeline of Developments in Programmatic Tool Calling
- Cloudflare published a report in September 2025 advocating for programmatically invoking tools within an MCP server, showing potential token savings of 30%–80%.
- Anthropic echoed these findings in November 2025 with their article on building efficient agents using MCP.
Recent Advancements and Community Response
- Anthropic released advanced tools including a search function that optimizes token usage further.
- The open-source community rapidly adopted these concepts, leading to implementations across various platforms like Blocks Goose Agent and Light LLM.
Conclusion on Current State and Future Directions
- As of now, these advancements have moved beyond beta testing into full support with dynamic filtering capabilities for web searches.
5.2 API Enhancements and Tool Support
Key Insights on LLMs and Code Generation
- Version 5.2 has introduced support for over 20 different tools via their API, enhancing the capabilities of large language models (LLMs).
- LLMs are trained on billions of lines of code, particularly effective in generating and understanding code but lacking in synthetic JSON tool calling formats.
- Anthropic's Sonnet 46 release includes two new tools: web search and dynamic filtering, which improve how agents interact with data.
Improvements in Web Search Capabilities
- The new features allow the model to write and execute code during web searches, filtering results before they enter the context window to enhance accuracy.
- Initial tests showed an average improvement of 11% in performance metrics while reducing input tokens by 24%, indicating significant efficiency gains.
Benchmark Performance Analysis
- The browser comp benchmark assesses an agent's ability to navigate websites for hard-to-find information; Sonnet improved from 33% to 46%, while OPUS increased from 45% to 61%.
- In the deep search QA benchmark, which evaluates finding multiple correct answers through web searches, Sonnet saw an F1 score rise from 52% to 59%.
Token Cost Considerations
- Token costs can vary based on how much code is generated for filtering; Sonnet's price-weighted token decreased while OPUS's increased due to more extensive coding requirements.
- This indicates that a reduction in output tokens does not always correlate with lower token costs; careful consideration is needed when evaluating performance.
Utilizing New Features Effectively
- Users employing the search API need only enable data fetching; Anthropic will automatically optimize token usage by returning only relevant information.
- Additional tools have exited beta status, including code execution sandboxes and programmatic tool calling, along with detailed documentation provided for user guidance.
Implementation Structure
- To implement these tools effectively, users must define what each tool does alongside its input/output schema within a structured format.
- Instead of traditional function calls, models like Cloud will now generate code directly for executing specific tasks as part of standard industry practices.