Claude Code Can Now Control Your Browser (Thanks to Vercel)
Introduction to Agent Browser
Overview of Agent Browser
- The Agent Browser is an open-source headless browser CLI developed by a single VEL employee, enabling agents to perform various tasks in the browser, such as dragging and dropping, uploading images, and toggling offline mode.
- The speaker questions the necessity of using this tool over traditional browsers with more features and hints at potential developments from Versel in the agent browser space.
The Future of AI Agents
AI Agents in Development
- By 2026, it is anticipated that AI agents will handle writing, reviewing, and testing code autonomously, reducing reliance on IDEs as developers shift towards terminal-based workflows.
- The need for agents to interact with and test their own code is emphasized to avoid tedious manual testing processes.
Functionality of Agent Browser
Key Features
- The tool allows for creating accessibility snapshots that provide a tree structure of page elements and supports reference-based actions for element interaction.
- Semantic locators are available for finding elements based on attributes like area or text content.
Demo: Using Agent Browser
Practical Application
- A demo showcases a login page where the agent attempts to implement dark mode but initially fails to do so correctly.
- The agent uses commands through the agent browser interface to identify issues with dark mode functionality.
Agent's Problem-Solving Process
Fixing Issues
- Commands are run through
agent-browser-helpto check available functionalities; no slash commands or skills are required.
- After taking screenshots before and after fixing dark mode issues, the agent successfully implements changes leading to a functional dark mode.
Validation Testing by Agents
Addressing Additional Issues
- Another issue regarding login validation was addressed by an additional agent running tests using commands from the agent browser.
- The process included creating bash scripts for automated testing scenarios which validate user input effectively.
Architecture Behind Agent Browser
Technical Insights
- Commands sent from agents are processed by a Rust binary that converts them into JSON format for further handling.
- This architecture allows multiple sessions via Unix sockets managing Chromium browsers efficiently while maintaining performance due to Rust's resource efficiency.
Comparison with Other Tools
Evaluating Alternatives
- Unlike traditional browsers or Playwright MCP servers that can operate independently without external agents, the agent browser requires specific command structures but offers unique capabilities tailored for automation.
Agent Browser vs. Playwright MCP Server: A Comparative Analysis
Overview of Agent Browser
- The Agent Browser is designed to be simpler and requires an external agent for operation, such as Cursor or Claw Code.
- Interaction with the Agent Browser occurs through CLI commands, emphasizing its straightforward nature.
Comparison with Playwright MCP Server
- Currently, the Agent Browser only supports Chromium browsers, excluding Firefox and Safari, which are supported by the Playwright MCP server.
- The Playwright MCP server offers comprehensive functionality that aligns with all capabilities of Playwright but is tailored specifically for agents.
Considerations for Tool Selection
- A potential drawback of using multiple MCP tools is the risk of confusing agents due to an overwhelming number of available options.
- The choice between using the Playwright MCP server and the Agent Browser largely depends on specific use cases and user preferences.
Personal Preference Insights
- The speaker expresses a preference for the simplicity and ease of installation offered by the Agent Browser.
- As a primary user of Chromium browsers, they are not concerned about the lack of support for Firefox or Safari.