This AI Agent can actually self-evolve… just watch
Introduction to Space Agent
Overview of Self-Updating Agents
- The Space Agent is described as the first self-updating agent, capable of creating user interfaces (UIs) that other agents cannot.
- Van, the creator of Space Agent and founder of Agent Zero, introduces the product and its unique capabilities.
Limitations of Traditional Agents
- Traditional agents operate on limited layers, often restricted by their communication interfaces like WhatsApp or Telegram.
- Even with rich web UIs, traditional agents can only perform limited actions without modifying backend processes.
Key Features of Space Agent
Client-Side Operation
- Unlike traditional agents, Space Agent runs client-side in a JavaScript runtime, allowing it to mutate the displayed page directly.
Dynamic Communication
- The agent can dynamically render various content types such as prices, charts, news, and even games based on user requests.
User Experience with Space Agent
Initial User Interaction
- Users are presented with an empty space upon entering Space Agent for the first time; no setup is required for use.
Open Source Accessibility
- Users can try out Space Agent for free via GitHub; it offers a guest account option requiring minimal setup.
Demonstration of Functionality
Example Interaction
- A demonstration shows how users can ask about weather conditions in Prague and receive responses both in chat and through widgets.
Technical Insights
- The agent's response mechanism involves generating raw text without additional formatting or tokens unless necessary.
Efficiency and Token Management
Token Efficiency Explained
- The conversation loop is designed to be token-efficient; responses are generated using minimal tokens while maintaining clarity.
Simplified Response Mechanism
- The agent responds in plain text when possible, avoiding unnecessary complexity in its output.
Token Efficiency and System Design
Token Management in AI Models
- The discussion highlights the efficiency of generating widgets with minimal token usage, noting that it only took about 280 tokens to create one.
- As AI models become more expensive, particularly with GBD 5.5 Pro costing $180 per million output tokens, there is a growing interest in "tokconomics" to optimize costs associated with running inference.
Prompt Optimization Techniques
- The speaker describes an automated research process using Codeex to iterate on system prompts, creating three versions: conservative, medium change, and wild variations for testing.
- Continuous optimization led to version number 250 of the prompt, focusing on token efficiency and reliability through iterative testing.
Dynamic User Interface Creation
- A demonstration of a notes app showcases how multiple widgets can cooperate within a dynamic space that allows resizing and rearranging elements easily.
- The implications suggest that users can design custom UIs without traditional coding or server hosting; everything runs in the browser using JavaScript.
Future Operating Systems Concept
- The vision presented indicates a future where operating systems may not require traditional apps or interfaces since agents could manage tasks like email sorting directly through user commands.
Notes App Functionality Overview
- The notes app developed supports various features such as folder management, visual editing, markdown view conversion, copy-pasting images/attachments—functioning effectively as a complete notes application.
Local Data Management and Security
Browser Limitations and Backend Solutions
- Browsers have security limitations preventing file management on host systems; thus, a thin Node.js backend manages user permissions and file storage securely.
Native App Advantages
- Users are encouraged to download the native app for local data storage without server communication; this setup enhances privacy by keeping all files on the user's machine.
Dashboard Creation Using Space Agent
Surveillance Dashboard Example
- A surveillance dashboard was created quickly by selecting public IP cameras; the agent generated this interface in just minutes after initial camera selection.
Time Investment for Development
- Building the notes app required more time due to detailed instructions (approximately 10 minutes), while simpler dashboards were constructed rapidly (1–2 minutes).
Space Agent: Orchestrating Dynamic Systems
Overview of Space Agent's Capabilities
- Space Agent excels in orchestrating multiple systems, allowing for dynamic UI management and control over various agents from a single dashboard.
- The speaker connects Space Agent to an instance of Agent Zero on their local machine, creating a chat interface that utilizes the Agent Zero API for communication.
Functionality and Performance
- The integration features an embedded browser window that allows full control over the Agent Zero web UI without relying on APIs or shortcuts.
- Initial prototypes are typically fast due to their simplicity; however, this implementation maintained speed even after adding functionalities.
Technical Implementation
- The agent transcribes web pages into a simplified format, enabling it to interact with elements like buttons and input fields by referencing them with unique identifiers.
- Commands sent by the agent (e.g., "click button 25") yield updated transcriptions of the page state, facilitating seamless interaction.
Challenges and Solutions
- Web crawling presents challenges due to complex structures like iframes and shadow DOMs; solutions involved injecting hacks into the browser renderer for better accessibility.
- A combination of AI assistance and manual adjustments led to an effective transcription method that optimizes interactive element recognition while minimizing token usage.
Token Management Strategy
- Historical states are not retained in memory once they become outdated, preventing unnecessary token consumption during interactions.
- By managing transient data effectively, the system maintains efficiency in token usage while ensuring relevant context is always available for ongoing conversations.
Future Enhancements
- There is a desire to demonstrate live updates within custom UI elements, showcasing how easily modifications can be made using Space Agent's capabilities.
- The speaker suggests starting with pre-built interfaces or conducting research-driven modifications as part of future demonstrations.
Research Harness Development and UI Interaction
Overview of the Research Harness
- The UI is designed to facilitate communication between the agent and itself, allowing for a two-way interaction. A research harness has been created and tested on Agent Zero and GPD 5.5.
- Custom instructions can be set for each space, enabling tailored interactions based on specific requirements or characteristics.
Testing the Research Functionality
- During testing with Claude Mythos, the agent receives messages from the UI and updates a research template in real-time, gathering notes and creating a markup file based on predefined templates.
- The flexibility of creating custom UIs allows users to design interfaces beyond standard options like WhatsApp or terminal interfaces.
Enhancements in Output Formatting
- Suggestions were made to improve formatting by adding colors or converting outputs into PDF format for better presentation.
- The chat interface offers different display modes; compact mode is preferred for clarity when multiple elements are present.
Exporting Features and Persistence
- A new feature was added to export research as a PDF, which enhances usability since this functionality was not originally included in the space's features.
- The persistence of data is highlighted; refreshing the page does not lose any changes made by the agent due to how widgets are created and stored.
Creating New UI Elements
Adding Interactive Components
- There’s an intention to create various UI elements from scratch, starting with a Kanban board styled similarly to Trello, emphasizing colorful designs.
- Developers are encouraged to make components extensible while considering performance; faster models than Opus 4.7 can be utilized during development.
Performance Insights
- Initial development used GPT 5.4 mini for quick responses but found that Gemma 4 performed comparably well locally despite being less reliable than other models like Claude Sonet or Claude Opus.
Additional Features Exploration
- Following the creation of a Kanban board, there’s interest in integrating diverse functionalities such as stock price charts for companies like Nvidia, Apple, and Google (Alphabet), exploring API limitations for data retrieval.
Agent Capabilities and Limitations
Overview of Agent Functionality
- The training data for agents is crucial as it influences future iterations, similar to a tech stack that evolves over time.
- Running the agent in a browser prevents API blocks typically encountered by VPS-hosted agents, allowing smoother operation.
Limitations of Space Agent
- Space Agent is not suitable for tasks requiring operating system access or low-level operations like installing Linux packages.
- It cannot perform background jobs since it operates on the front end; turning off the computer halts its functionality.
- Future developments may address background job capabilities, but current focus remains on user feedback and needs.
Potential and User Engagement
- Developers believe they have only tapped into about 5% of the agent's potential, indicating room for exploration and innovation.
- The concept of a "new paradigm" suggests that understanding and utilizing these agents will take time for both developers and users.
User Experience with Spaces
Functionality of Spaces
- Users can create multiple persistent spaces for different projects without incurring additional costs; each space is lightweight in storage.
Development Insights
- A step sequencer was developed quickly (around 20 minutes), showcasing rapid prototyping capabilities within the platform.
Accessibility of AI Tools
Engaging Non-Tech Users
- The platform serves as an entry point for non-tech individuals to experience AI's power through simple English commands without needing technical knowledge or installations.
Local Inference Requirements
- To utilize local inference, users need a powerful GPU; models from Hugging Face can be integrated easily into the system.
Understanding Space Agent and Its Development
Concept of Compatible Models
- The discussion begins with the mention of a community repository containing numerous models, highlighting that currently, only the Jim 4 model is relevant for writing JavaScript and understanding system properties.
Philosophy Behind Space Agent
- The creator shares their thought process in developing Space Agent, emphasizing a desire to connect various ideas and create an accessible agent similar to Agent Zero.
Self-Modifying Capabilities
- A key feature of Space Agent is its ability to fully control itself, including modifying its own modules dynamically based on user needs. This allows for personalized development environments.
Client-Side Processing Advantages
- Unlike other systems like Agent Zero that rely on backend processing (Python), Space Agent operates entirely on the client side, resulting in minimal server load (2% CPU usage).
Browser as the Chosen Platform
- The decision to use a browser was driven by the need for cross-platform compatibility without installation requirements. Browsers provide a standardized runtime across all operating systems.
Challenges with Browser Limitations
- While browsers offer great security features (sandboxing), they also present challenges such as cross-origin issues which require workarounds during development.
Development Process Using AI Tools
Utilizing Codeex for Development
- The developer reveals that they did not write any code manually; instead, they utilized Codeex to build the entire codebase efficiently.
Importance of Documentation
- An essential part of the workflow involves creating an agents.mmd documentation framework that helps maintain context and prevents redundancy in coding tasks.
Comprehensive Documentation Strategy
- Every aspect being developed is documented simultaneously within agents.mmd files, allowing the AI agent to understand core principles and functionalities effectively.
This structured approach ensures clarity in both development and future adaptations of software projects.
Agent Development Insights
Current State of Agent Development
- The speaker emphasizes the need for all preferences and decisions to be documented in markdown files within the repository, highlighting ongoing challenges with agent decision-making.
- Despite improvements over the past couple of years, there are still numerous instances where manual corrections are necessary. Documentation helps agents avoid repeating mistakes.
- Recent advancements like GBD 5.5 and DeepSc V4 are expected to significantly reduce existing issues within 3 to 6 months.
Acceleration in Development
- The development timeline for Agent Zero took three months initially, while Space Agent was developed in about four weeks, showcasing a marked acceleration in productivity.
- The MVP (Minimum Viable Product) for Space Agent was completed even faster, indicating a substantial increase in efficiency—estimated at a speed-up factor of 12 to 20 times compared to earlier projects.
Coding Skills and AI Integration
- The speaker argues that coding remains an essential skill despite advancements in AI tools; they can quickly identify inefficiencies and guide AI towards better solutions.
- Continuous monitoring of AI outputs is crucial; the speaker actively checks multiple code execution windows for errors during development.
Tool Preferences: Codex vs. Cloud Code
- Preference for Codex over Cloud Code is discussed; Codex performs better with hierarchical documentation and consistent updates compared to Cloud Code's tendency to overlook documentation changes.
- An anecdote illustrates frustrations with Cloud Code's inefficiency when fixing minor issues, contrasting it with Codex’s superior performance.
Cost Efficiency and Value Proposition
- A comparison between OpenAI's pricing plans reveals significant differences in value; OpenAI offers more inference capabilities at similar costs compared to competitors.
- While both tools have their strengths, Codex is preferred for complex tasks requiring significant refactoring or debugging due to its effectiveness.
Relationship Between Agent Zero and Space Agent
- Both products are developed by the same team but serve different use cases; users may utilize both simultaneously depending on their needs.
- Space Agent is designed as a more dynamic tool that interacts closely with users, whereas Agent Zero operates autonomously without user interruptions.
How to Use Space Agent for Interactive Browsing
Interactive Browser Features
- The browser is fully interactive, allowing users to navigate websites like Google and solve captchas directly.
- Users can set up an anonymous browsing experience through the Space Agent platform by visiting space-agent.ai and entering their API key.
Setting Up the API Key
- New users can find the option to set their LLM API key on the Space Agent website, which defaults to OpenRouter for easy access to various AI models.
- Users are guided on how to create an API key from OpenRouter, ensuring a seamless integration with Space Agent.
User Experience and Model Performance
- The initial user experience is designed to be impressive and tolerant, especially for new users interacting with the "Big Bang" space.
- Sonet is highlighted as a balanced model in terms of cost and performance, with discussions about potential future releases (Sonet 4.7 or 5).
Unique Features of Space Agent
- A notable feature of Space Agent is its time travel capability, allowing users to revert changes made within their directories easily.
- Changes are tracked automatically in a Git repository format, enabling users to undo actions without needing external assistance.
Admin Mode Functionality
- An admin mode provides a persistent interface that allows users to fix issues even if they break the main application interface completely.
- This mode enables interaction with agents and management of installed modules while maintaining access to time travel features.
Troubleshooting and Support
- If an agent breaks itself due to critical dependency issues, manual intervention may be required; however, Space Agent simplifies this process significantly.
- Beginners often face challenges during debugging but can utilize built-in tools within Space Agent for easier troubleshooting.
Getting Started with Space Agent
- Users are encouraged to explore both the GitHub repository and the official website for comprehensive information about setup and usage.
- The easiest way for newcomers is trying out the live version available on space-agent.ai before downloading any local applications.