I Tested Every AI Automation Agent To Find The Best One (not what you think)

Name: I Tested Every AI Automation Agent To Find The Best One (not what you think)
Uploaded: 2025-05-16T18:41:08.000Z
Duration: 34 min 53 s

Testing AI Automation Agents

Overview of the Testing Process

The video aims to evaluate various AI automation agents, including Manis, GenSpark, Deep Agent, and Convergence, to determine which is the best for tasks like web scraping and report generation.

The evaluation will consider not only performance but also pricing and features of each tool. The presenter anticipates surprising results.

Web Scraping Task

The first test involves scraping data from multiple websites with varying formats to challenge the agents' capabilities. A specific prompt was used to guide their actions.

Five agents (Deep Agent, Sooner, Convergence, Manis, GenSpark) were tasked with gathering information on funds/ETFs from specified sites. Each agent's approach and efficiency are being monitored closely.

Performance Observations

Deep Agent: Initially asked clarifying questions about the task requirements before proceeding; this cautious approach raised concerns about potential inefficiencies in other agents that did not seek clarification.

GenSpark: Completed the task first but provided incomplete data by listing only a few funds instead of all available options on the Fidelity website. This indicates a lack of thoroughness despite speed.

Manis: Similar to GenSpark, it completed the task but failed to capture all required information accurately; it listed fewer funds than expected as well.

Convergence & Suna: Both struggled significantly with the task; Suna's output was particularly confusing and unhelpful while Convergence also failed to deliver accurate results.

Conclusion of First Test: Deep Agent emerged as the most effective tool by successfully identifying over 150 funds across two websites while others either failed or provided incomplete data. It received high praise for its performance compared to others that fell short in accuracy or clarity.

Research Report Generation Task

The second test involved generating a comprehensive research report on top stocks for investment, requiring detailed analysis including charts and photos based on a specific prompt given to all five tools simultaneously.

Initial Responses

All agents began working on their reports promptly; however, they varied in their approaches:

Suna & Convergence: Both started building structured plans for their reports effectively.

Manis & GenSpark: Also initiated work quickly but details regarding their outputs were still pending at this point.

AI Tools Comparison: Follow-Up Questions and Performance

Importance of Follow-Up Questions in AI Tools

The speaker highlights that Deep Agent is unique among AI tools for asking follow-up questions, which enhances the quality of interaction by gathering specific information about investment preferences.

Other AI agents tend to assume users have no prior knowledge and proceed without context, potentially leading to less effective outcomes.

Evaluation of AI Agents' Performance

Jensen Spark performed well in executing tasks as requested, demonstrating reliability in following instructions without deviating from the user's goals.

Manis also received a positive evaluation for its performance, earning an 'A' grade alongside Jensen Spark for effectively completing tasks.

Convergence was criticized for failing to perform adequately; it triggered unnecessary human intervention with a CAPTCHA request, indicating a breakdown in functionality.

Issues with Specific AI Tools

Suna was noted as performing poorly on multiple tasks, failing to complete requests satisfactorily and receiving negative feedback from the speaker.

In contrast, Deep Agent excelled by providing comprehensive reports tailored to user specifications, including charts and relevant data.

New Task: Building a Calorie Intake Tracking App

The speaker assigns a new task across all tools: creating a calorie intake tracking app based on detailed specifications provided.

Each tool's ability to follow instructions accurately is tested again; GenSpark and Manis are noted as being off to work immediately.

Observations During Development Phase

Convergence attempts to clarify requirements but has lost credibility due to previous failures.

Suna sets up initial structures but lacks execution capability within its task management system, likened humorously to a teenager's approach.

Final Results of App Development

GenSpark produced a functional app but lacked meal input features; overall performance was deemed solid despite this limitation.

Deployment Challenges and Tool Comparisons

Deployment Confirmation

The speaker discusses the need to confirm deployment options, asking if the project should be deployed to a public URL or if the project file should be provided for self-deployment.

Expresses frustration over delays in deployment, indicating that multiple researchers are involved without clear progress.

Frustrations with Testing Tools

The speaker highlights dissatisfaction with testing tools, stating that they were unable to perform a simple test as requested.

Mentions encountering a sign-up requirement on Soona's platform, expressing discontent with needing to register just to view results.

Deep Agent Build Features

Introduces Deep Agent as a promising tool, showcasing its features such as meal tracking and goal adjustments.

Demonstrates adding meals (e.g., chicken breast), emphasizing this app's functionality compared to others.

Performance Evaluation of AI Tools

Conveys disappointment in several tools (Manis, Convergence, Suna), which failed to complete tasks effectively.

Declares Deep Agent as the standout performer among tested tools, suggesting it is worth subscribing for $10/month due to its comprehensive features.

Pricing and Value Assessment

Discusses pricing structures for various AI tools:

Abacus AI starts at $10/month per user;

Genspark has plans starting at $24.99/month;

Manis offers free credits but charges $39/month;

Convergence costs $20/month.

Recommendations Based on Functionality

Critiques free plans offered by Suna, GenSpark, Convergence, and Manis as inadequate for serious use.

Recommends Abacus and Deep Agent for their value while advising against Suna and Convergence due to poor performance.

Additional Features of Competing Tools

Highlights additional functionalities of GenSpark like AI slides and video generation capabilities.

Accessing Advanced AI Tools

Overview of Route LLM and Its Features

The route LLM is designed to direct user inquiries to the most suitable language model (LLM), providing access to various specialized models.

It includes a code assistant known as Code LM, which aids users in coding tasks and programming queries.

Users can also utilize an AI engineer feature that allows for the creation of custom AI agents and chatbots tailored to specific needs.

The platform offers additional resources and tools beyond these core features, enhancing user experience with advanced functionalities.

Introduction of MCP for Deep Agent

The recently released MCP for Deep Agent is highlighted as the only autonomous agent currently equipped with access to MCP technology.

Users can integrate up to five different MCP servers simultaneously, expanding their operational capabilities within the platform.