Claude Code Can Now Control Your Browser (Thanks to Vercel)

Claude Code Can Now Control Your Browser (Thanks to Vercel)

Introduction to Agent Browser

Overview of Agent Browser

  • The Agent Browser is an open-source headless browser CLI developed by a single VEL employee, enabling agents to perform various tasks in the browser, such as dragging and dropping, uploading images, and toggling offline mode.
  • The speaker questions the necessity of using this tool over traditional browsers with more features and hints at potential developments from Versel in the agent browser space.

The Future of AI Agents

AI Agents in Development

  • By 2026, it is anticipated that AI agents will handle writing, reviewing, and testing code autonomously, reducing reliance on IDEs as developers shift towards terminal-based workflows.
  • The need for agents to interact with and test their own code is emphasized to avoid tedious manual testing processes.

Functionality of Agent Browser

Key Features

  • The tool allows for creating accessibility snapshots that provide a tree structure of page elements and supports reference-based actions for element interaction.
  • Semantic locators are available for finding elements based on attributes like area or text content.

Demo: Using Agent Browser

Practical Application

  • A demo showcases a login page where the agent attempts to implement dark mode but initially fails to do so correctly.
  • The agent uses commands through the agent browser interface to identify issues with dark mode functionality.

Agent's Problem-Solving Process

Fixing Issues

  • Commands are run through agent-browser-help to check available functionalities; no slash commands or skills are required.
  • After taking screenshots before and after fixing dark mode issues, the agent successfully implements changes leading to a functional dark mode.

Validation Testing by Agents

Addressing Additional Issues

  • Another issue regarding login validation was addressed by an additional agent running tests using commands from the agent browser.
  • The process included creating bash scripts for automated testing scenarios which validate user input effectively.

Architecture Behind Agent Browser

Technical Insights

  • Commands sent from agents are processed by a Rust binary that converts them into JSON format for further handling.
  • This architecture allows multiple sessions via Unix sockets managing Chromium browsers efficiently while maintaining performance due to Rust's resource efficiency.

Comparison with Other Tools

Evaluating Alternatives

  • Unlike traditional browsers or Playwright MCP servers that can operate independently without external agents, the agent browser requires specific command structures but offers unique capabilities tailored for automation.

Agent Browser vs. Playwright MCP Server: A Comparative Analysis

Overview of Agent Browser

  • The Agent Browser is designed to be simpler and requires an external agent for operation, such as Cursor or Claw Code.
  • Interaction with the Agent Browser occurs through CLI commands, emphasizing its straightforward nature.

Comparison with Playwright MCP Server

  • Currently, the Agent Browser only supports Chromium browsers, excluding Firefox and Safari, which are supported by the Playwright MCP server.
  • The Playwright MCP server offers comprehensive functionality that aligns with all capabilities of Playwright but is tailored specifically for agents.

Considerations for Tool Selection

  • A potential drawback of using multiple MCP tools is the risk of confusing agents due to an overwhelming number of available options.
  • The choice between using the Playwright MCP server and the Agent Browser largely depends on specific use cases and user preferences.

Personal Preference Insights

  • The speaker expresses a preference for the simplicity and ease of installation offered by the Agent Browser.
  • As a primary user of Chromium browsers, they are not concerned about the lack of support for Firefox or Safari.
Video description

Agent Browser is a headless browser automation CLI that Vercel developed for AI agents (like Claude Code), created by Chris Tate in just a single weekend. This powerful tool combines a fast Rust CLI with Node.js to give AI agents complete control over web browsers through simple commands like open, click, fill, and snapshot, all with a unique ref-based system that makes element selection deterministic and AI-friendly. With support for multiple browser engines, session management, and seamless integration into existing AI workflows, Agent Browser provides a robust solution for web automation that works perfectly with tools like Claude Code and other AI agents. 🔗 Relevant Links Tweet from Chris Tate - https://x.com/ctatedev/status/2010400005887082907 Agent browser GH - https://github.com/vercel-labs/agent-browser ❤️ More about us Radically better observability stack: https://betterstack.com/ Written tutorials: https://betterstack.com/community/ Example projects: https://github.com/BetterStackHQ 📱 Socials Twitter: https://twitter.com/betterstackhq Instagram: https://www.instagram.com/betterstackhq/ TikTok: https://www.tiktok.com/@betterstack LinkedIn: https://www.linkedin.com/company/betterstack 📌 Chapters: 0:00 Intro 0:22 Agents are taking over in 2026 0:52 Introducing agent-browser by Vercel 1:30 Agent browser demo 1 on React + Vite proj 3:00 Agent browser fixes form validation 4:17 How agent browser works 5:06 Agent browser vs Browser Use vs Playwright MCP 6:30 My thoughts on agent-browser