Building Intelligent Research Agents with Manus - Ivan Leo, Manus AI (now Meta Superintelligence)
What is Manis and How Does It Work?
Overview of the Workshop
- The workshop will showcase demos built with Manis, utilizing the new Manus API to reproduce their original Slackbot.
- Attendees are encouraged to ask questions throughout the session.
Understanding Manis
- Manis is described as an "action engine" that executes tasks, automates workflows, and extends human capabilities.
- Recent updates (Manus 1.5) have improved speed, quality, and user satisfaction while addressing challenges in building AI agents.
Features and Integrations
- Manis aims to be a versatile AI agent usable across various platforms including mailboxes, Slack, custom workflows, and mobile apps.
- Current offerings include a web application, Slack app, newly launched API, browser operator, Microsoft 365 integration for document editing, and Mail Madness.
Demonstrating Language Learning with Manis
Personal Use Case: Language Practice
- The speaker shares their experience using Manis to practice French by inputting daily thoughts into the app.
- Manis provides inline corrections and explanations for language queries through its integrated language model.
User Interaction with the App
- Users can prompt Manis for corrections or clarifications on specific words or phrases they encounter while practicing a language.
- The app creates a personalized profile based on user interactions which helps tailor suggestions for writing improvement.
Exploring Additional Functionalities of Mailman
Connecting Devices for Task Management
- The speaker demonstrates connecting their phone to manage tasks using Mailman within the app interface.
New Browser Operator and AI Integration
Introduction to the New Browser Operator
- The speaker discusses a new browser operator that enhances user interaction by allowing tasks to be managed more effectively.
- An example is provided where the speaker requests a simple coffee after a long flight, showcasing how the system can understand and execute user commands.
Functionality of the Remote Browser Operator
- The remote browser operator initiates a browser tab on the user's computer, enabling access to authenticated platforms like LinkedIn or Instagram.
- This feature significantly changes how users interact with various online services, as it allows for seamless task execution without sandbox limitations.
Demonstration of Task Execution
- The speaker demonstrates opening Google Maps through the initiated tab, illustrating real-time task management capabilities.
- Multiple instances of tasks can run in parallel, making it efficient for users with dedicated hardware setups.
Data Scraping and Event Management
- The speaker shares an experience where they instructed the AI to scrape event data from a website, which was then organized into a JSON file for easy access.
- Users can add events directly to their Google Calendar with one click, enhancing productivity through integration with external platforms.
Advanced Features and Future Developments
- The platform supports complex integrations such as Stripe and Redis Queue, allowing users to set up webhooks automatically.
- Future enhancements will include autoscaling and warm deployments, aimed at simplifying MVP development processes while maintaining affordability.
What is the Manis API?
Overview of Manis and Its Capabilities
- The speaker introduces the concept of "Manis" and its deployment, emphasizing its ease of use and potential for improvement.
- The Manis API allows users to build various applications, including websites and remote browser operations, with each chat having its own sandbox environment.
Internal Applications Built with Manis
- Examples of internal tools include a public-facing Slackbot, Zapier integration, N8 integration, and support bots that automate message interactions in chats.
- The speaker expresses enthusiasm about building these tools collaboratively and mentions hopes for compatibility with Stripe and Modal.
Getting Started with the Manis API
- To begin using the API, participants need an API key from their Manis account; billing aligns with standard chat usage costs.
- The goal is to provide a customizable experience without concerns over cost differences between using the API or chat integrations.
Setting Up Environment Variables
- Participants are guided on setting up necessary environment variables for Slack integration alongside obtaining their API key.
- Five notebooks will be utilized during the workshop to demonstrate functionality; instructions for acquiring Slack tokens are provided.
Validating the API Key
- After loading the environment variables, validation checks confirm that no files have been uploaded yet due to using a fresh account.
- All files uploaded via the Manis API are automatically deleted after 48 hours to ensure user privacy regarding sensitive information.
Creating Tasks Using the Manis API
- The speaker demonstrates creating a task through a core request while mentioning documentation support for various AI frameworks like OpenAI SDK.
- An example query ("What is 2 plus 2?") illustrates how tasks return essential data such as task ID, title, and URL for further interaction.
Understanding Manus API and Its Features
Overview of Manus Models
- The Manus API has two models: Manus 1.5 and Manus 1.5 Light, with the former recommended for complex tasks.
- For simpler queries requiring faster responses, Manus 1.5 Light is preferred; this workshop will utilize the lighter model.
Context Management in Manus
- The new agent offers unlimited context management, allowing it to handle longer interactions effectively.
- Smart context management ensures high KV caching, leading to quick response times as discussed in a related article by the CTO.
Task Creation and Status Tracking
- Users can create tasks (e.g., checking weather), which return a task ID, title, and URL for tracking.
- Tasks can be in one of four states: running, pending, completed, or error; ideally aiming to avoid errors.
Interaction Example
- An example interaction involves querying about weather forecasts and planning activities based on that information.
- Responses include credits used and messages exchanged between the user and the agent during the task.
Polling Mechanism
- Polling is introduced as a straightforward method for monitoring task status throughout its lifecycle.
- Understanding task statuses helps determine subsequent actions within applications using the Manus API.
Managing Contextual Information
Importance of Context in Language Models
- Maintaining context is crucial when working with language models; various methods exist for managing this within the API.
File Upload Capabilities
- The API supports file uploads for sensitive or larger files, enhancing functionality beyond simple text inputs.
Practical Application with Rick and Morty Data
- A practical example involves fetching character data from "Rick and Morty" via an external free API to demonstrate file handling capabilities.
Data Structure Overview
- The fetched data includes character details such as status, origin location, episodes they appeared in, along with images.
This structured overview captures key insights from the transcript while providing timestamps for easy reference back to specific parts of the discussion.
How to Use the Files API and Manage Data in Manis
Uploading and Managing Files
- The process begins with checking the database, saving it, and generating an ST link for uploading larger files via a PUT request. A file ID is created upon upload.
- After uploading, users can create a simple website to visualize data from the uploaded files, such as images or PDFs.
- Manis supports various file types out of the box, including handling multimodal content like PDFs efficiently.
- Users have the option to delete files at any time through an API call, which is useful for managing large datasets that may need automatic deletion.
Utilizing URL Attachments
- The platform allows for URL attachments; for instance, users can send raw call transcripts directly to Manis from other applications like Circleback.
- An example provided involves using a PDF link from Warren Buffett's investor letters instead of a transcript for analysis.
Task Automation and Analysis
- Users can initiate tasks that automatically analyze public URLs. For example, analyzing an investor letter to summarize insights about company strategies.
- Connections configured in Manis (e.g., Gmail or Notion) work seamlessly with these tasks without additional setup required.
Bug Investigation Tasks
- An automated bug investigation task was demonstrated where users could submit encoded images of error pages (like 404 errors), allowing Manis to diagnose issues effectively.
Data Visualization Capabilities
- The Rick and Morty dataset showcases how character data can be visualized on a website, displaying gender distribution and character status (alive/dead).
- Even when misnamed (e.g., calling a PDF "transcript.json"), Manis correctly identifies file types and performs analyses accordingly.
Webhooks for Task Completion Notifications
- To enhance efficiency as task volume increases, webhooks are introduced. They notify users via API requests when tasks are completed rather than relying on polling every few seconds.
Understanding Web Hooks and Task Management
The Importance of Web Hooks in Task Management
- Waiting for task completion is crucial; using web hooks provides real-time updates about tasks without the need for constant API polling.
- Utilizing web hooks allows for a more sustainable and cost-effective approach to managing multiple tasks, as opposed to spinning up numerous workers.
Setting Up Modal for Python Applications
- Modal enables deployment of simple Python applications, such as FastAPI endpoints, providing users with public endpoints upon setup.
- Users can experience stress when multiple requests hit their endpoint simultaneously, highlighting the importance of efficient endpoint management.
Registering and Using Web Hooks
- When registering a web hook with Manus, notifications are sent when tasks are created or completed, streamlining task tracking on the front end.
- For complex tasks that take longer (3-5 minutes), web hooks eliminate the need for continuous polling by sending notifications directly.
Managing Multiple Tasks Efficiently
- With many tasks running concurrently, web hooks allow developers to receive notifications without manual checks or excessive API calls.
- Successful registration of a web hook involves posting to the Manus API at a specific base URL.
Creating and Tracking Tasks
- After creating a task (e.g., solving "2 plus 2"), users can track its status through received payload data once completed.
- The returned JSON payload includes essential information like task ID and output status, facilitating identification among multiple ongoing tasks.
Integrating Slack with Task Management
Setting Up Slack Integration
- Transitioning from basic task management to integrating Slack enhances interaction capabilities with chatbots within designated channels.
Testing Slack Bot Functionality
- Obtaining Slack signing secrets is necessary for bot functionality; testing these secrets ensures proper integration before full deployment.
How to Build a Simple Slack Bot
Setting Up the Bot
- The process begins with defining a simple bot, referred to as "bot Y," and initializing it using
mod surf chat.
- A basic model app is initialized with necessary dependencies, including a stored Slack secret. An endpoint called
/webhooks/slackis created for interaction.
- When this endpoint is hit, it returns a status response indicating that it's publicly accessible for Slack interactions.
Interacting with Slack API
- Upon mentioning the bot in Slack (e.g., "what's up"), an API request is generated, returning a payload containing user information.
- To register the server as a webhook on Slack, it’s essential to respond with a challenge URL when requested by Slack's API.
- After successfully verifying the webhook and saving changes, the bot can now receive mentions from users in Slack.
Responding to Events
- When mentioned in Slack, an API request captures details like user ID and message text which need parsing for responses.
- To send responses back to Slack, extract necessary data such as channel ID and thread timestamp (TS).
- A successful response example includes sending "hello world" back to the channel where the mention occurred.
Handling Response Timing
- It's crucial to respond within approximately three seconds; otherwise, Slack will retry sending requests. Keeping servers warm can help manage this timing effectively.
File Uploading Process
- Users can upload files directly into channels on Slack. Once uploaded, each file receives an ID for future reference.
- After uploading files, messages can be sent that include these file IDs as attachments for context or additional information.
Managing Threads and Files
- It’s important to ensure files are uploaded correctly within threads; otherwise they may not appear where intended in conversations.
Integrating Additional Functions
- The next step involves integrating additional functions from previous setups into the current application structure for enhanced functionality.
- Helper functions are utilized alongside newly defined classes to streamline operations within the main application framework.
This structured approach provides clarity on building and managing a simple bot on Slack while emphasizing key technical steps involved in setup and interaction processes.
How to Integrate Slack with a Web Application
Initial Setup and Bug Fixes
- The speaker discusses a bug encountered while integrating Slack with a web application, specifically related to not defining the manage URL correctly.
- After resolving the issue, they demonstrate that sending messages through Slack can trigger webhooks, allowing for processing on their server.
Creating Tasks in Manus
- A function called "create a manus task" is introduced, which posts requests to Manus using environment variables and prompts.
- The speaker tests the task creation process by confirming successful file creation and providing a link for viewing.
Improving User Experience
- The speaker notes that while tasks can be sent to Manus, responses are not being received. They plan to address this issue.
- An acknowledgment of an "ugly setup" leads to plans for improving the user interface by parsing user IDs from Slack API responses.
Enhancing Message Formatting
- The discussion shifts towards updating message formats in Slack for better clarity and aesthetics.
- Introduction of "Slack Blockit," which allows for more visually appealing messages with buttons and links.
Finalizing Task Creation Process
- The speaker adds blocks to messages, including URLs and emojis, enhancing interactivity within Slack messages.
- After testing changes, they confirm that the new UI is more engaging and user-friendly.
Multi-Turn Conversations in Slack Bots
- Discussion on enabling multi-turn conversations within the bot; tracking previous threads is essential for maintaining context.
- A simple dictionary or KV store will be used to keep track of conversation history when using cloud services.
Understanding the Use of a Dictionary in Task Management
Introduction to Moto and Endpoints
- The discussion begins with an overview of using Moto, which provides a robust dictionary for managing tasks. Two simple endpoints will be added to facilitate posting and setting requests.
Adding New Endpoints
- The speaker emphasizes the importance of integrating new endpoints into the server setup, ensuring that they function correctly within the existing framework.
Managing Task Data
- When retrieving chat data, it’s crucial to store additional information such as Slack channel details, user IDs, and job statuses. A dictionary is used for easy serialization of key-value pairs.
Creating and Storing Tasks
- A simple management task is initiated where task data (ID, URL, etc.) is stored in a structured manner. This allows retrieval through the same endpoint later on.
Updating Task Handling Logic
- The speaker plans to delete previous endpoints and create a new task mapping system that links thread IDs to specific tasks. This involves parsing incoming messages from Slack effectively.
Enhancing User Interaction with Thread Management
Parsing Incoming Messages
- A function is introduced to parse messages when users mention a bot in Slack. It checks if the thread ID has been encountered before to manage responses appropriately.
Reacting to User Inputs
- If a thread ID has been seen previously, a reaction (like an emoji indicating acknowledgment) is added by the bot to inform users that their request is being processed.
Implementing New Functionality
- A new function called
handleManageTaskreplaces older logic for better handling of incoming messages related to task management.
Finalizing Task Responses
Adjusting Server Logic
- The speaker identifies an issue where final answers do not return correctly to users in Slack. They plan adjustments similar to previous webhook implementations for proper response delivery.
Uploading Files and Final Endpoint Setup
- Discussion includes uploading files necessary for task completion while ensuring all required parameters are included in requests sent back through webhooks.
Testing New Features in Real-Time
Testing Server Functionality
- After implementing changes, testing begins by waking up the server and sending test messages via Slack. This helps verify whether tasks are created successfully and responses are managed properly.
Slack Bot Development and API Integration
Overview of the Task Execution
- The speaker initiates a task with a specific ID, indicating that they will view the live task execution and expect to receive a response via a Slack webhook.
Debugging Technical Difficulties
- The speaker encounters technical issues while explaining the functionality of a Slack bot that processes responses from "Mannis," which includes various markdown elements.
Markdown Transformation for Slack Compatibility
- The bot receives markdown content from Mannis, including tables and images, which need proper rendering in Slack due to its unique formatting requirements.
- A task info map is created when initiating new tasks, followed by converting this information into a compatible Slack markdown format to ensure consistency.
File Uploading Enhancements
- Previous uploads were directed only to the main channel; now, parameters are added to allow file uploads to both channels and threads in Slack.
Implementation Challenges and Questions
- The speaker plans to implement additional features using pre-written code due to potential technical difficulties. They pause for questions regarding API usage.
Testing with Real Examples
- The speaker tests the system using an image of bagels they purchased, sharing personal anecdotes about their experiences with local bagels.
Workflow Demonstration with Notion Integration
- After creating files and events in Slack, the bot communicates progress updates on requests. It integrates with Notion for referencing company policies relevant to claims processing.
OCR Functionality in Receipt Processing
- Mannis utilizes Optical Character Recognition (OCR) technology to extract details from receipts uploaded through the system, demonstrating practical applications of AI in expense management.
Invoice Management Capabilities
- Users can request updates on invoices through natural language prompts. Mannis retrieves necessary details from previous receipts and updates them according to company policy guidelines stored in Notion.
Conclusion on API Utility
- Despite some challenges during live coding sessions, the speaker emphasizes that the Manage API simplifies building complex applications efficiently.
Introduction to the Manus API
Overview of Manus API Capabilities
- The Manus API allows users to scale conversations effectively, handling millions of interactions daily. It supports various integrations and custom code uploads.
- Users can focus on core business logic without getting bogged down by technical details, thanks to the flexibility offered by the Manus API.
Engagement and Support
- Ivan Lio invites inquiries about the Manus API and mentions that they are currently hiring, encouraging audience interaction for questions.
- He suggests starting with their web app as a user-friendly introduction before diving into the more complex aspects of the API.
Getting Started with the Web App
Initial Steps for Users
- Users are encouraged to explore the web app first, which simplifies tasks without needing extensive integration or web hooks.
- The approach involves identifying repetitive problems, testing solutions in the web app, and then transitioning to using the API for more tailored needs.
Building a Conference Site Using Manus
Development Process
- Ivan shares his experience creating a conference site by scraping event data and implementing features like search functionality and mobile compatibility.
- He emphasizes an iterative process where he utilized various services alongside Manus to handle complex edge cases effectively.
Technical Insights
- A Python script was generated through Manus that converted time zones from PM/AM formats to UTC while scraping event information.
- The system's ability to read HTML and execute JavaScript allowed it to accurately gather necessary event data.
User Privacy Concerns
Data Handling Practices
- Ivan addresses concerns regarding user privacy, clarifying that transcripts are not accessible by staff unless issues arise that require investigation.
- All user data is stored securely in the US, ensuring compliance with privacy standards.
Exploring Use Cases for Manus API
Current Applications
- The manuscript highlights ongoing exploration of use cases for their new API, particularly in research contexts where users seek insights from large datasets.
Advantages of Manis and Its Features
Innovative Use Case: Automating Pickleball Reservations
- The speaker shares a personal experience using Manis to automate the booking of pickleball slots in Singapore, highlighting its competitive nature.
- A Python script was developed that utilized Selenium to scrape government websites for available pickleball slots, showcasing the practical benefits of using an agent with its own sandbox environment.
Browser Integration and User Permissions
- Discussion on the potential for integrating browser functionality through the API, emphasizing user control over which browser is used.
- Plans to enhance the permission system are mentioned, ensuring users can authorize necessary actions without unwanted tab openings.
Future Features: Markdown and Slide Generation
- Upcoming features will allow users to generate markdown or slides from Manis, maintaining consistency across different formats like PDF exports.
- Emphasis on achieving feature parity between API and UI experiences, ensuring seamless usability regardless of platform.
Memory Functionality in Conversations
- The speaker addresses inquiries about memory capabilities within conversations, indicating that while it's not currently possible, it is being actively considered for future updates.
- Users must be explicit in their interactions with Manis at present; however, there are plans to improve this aspect moving forward.