Building AI Agents from Scratch | Full Course
Introduction to Agentic Design Patterns
Overview of the Course
- The video serves as a compilation of previous lessons on agentic design patterns, aimed at creating a comprehensive course.
- The creator notes significant interest in the ML community regarding these videos, prompting the decision to compile them into an open-source course.
Course Structure and Content
- The course will not rely on any frameworks; it will be implemented from scratch using Python and Gro as the LLM provider.
- Four key agentic patterns will be covered: Reflection Pattern, Tool Use Pattern, Planning Pattern, and Multi-Agent Framework.
Detailed Breakdown of Modules
Module Descriptions
- The first module focuses on the Reflection Pattern; subsequent modules cover Tools, Planning, and Multi-Agent Patterns.
- Additional resources are available through a Substack blog with written explanations and code snippets for each module.
Implementation of Agentic Patterns
Project Introduction
- The project titled "Identic Patterns" involves implementing four agentic patterns defined by DeepLearning.AI from scratch.
Focus on Reflection Pattern
- Today's session begins with the Reflection Pattern; future videos will address Tool Use, Planning, and Multi-Agent Patterns.
Understanding the Reflection Pattern
Workflow Explanation
- The reflection pattern consists of two main blocks: Generate Block (produces content based on user prompts) and Reflect Block (provides feedback).
Process Flow
- A user prompt is processed by the LLM to generate initial content (e.g., an essay about a composer).
- Generated content is critiqued by the Reflect Block which suggests modifications before sending it back to Generate Block for revision.
Looping Mechanism in Reflection Pattern
Iteration Control
- The loop can continue indefinitely until stopped either by reaching a predefined number of iterations or through specific stop keywords defined in system prompts.
Practical Application
- Understanding this pattern's mechanics can enhance results significantly when applied to various LLM calls.
Transitioning to Code Implementation
Initial Steps
- Viewers are instructed to clone the repository where they will work with two main folders for coding exercises related to agentic design patterns.
Reflection Pattern Implementation in Python
Overview of the Notebooks Folder
- The initial focus is on the notebooks folder, which currently contains a reflection pattern but will expand as the project develops.
- Future additions to this folder are anticipated, enhancing its content over time.
Reflection Pattern Folder
- A separate reflection pattern folder exists that includes a proper implementation in Python. The session begins with an easy-to-understand notebook implementation before transitioning to more complex code.
- Users are encouraged to start Jupyter Lab or Jupyter Notebook to follow along with the tutorial.
Steps of the Reflection Pattern
- The reflection pattern allows a D LLM (Deep Learning Language Model) to critique and refine its outputs through three main steps:
- Generate a candidate output.
- Reflect on that output.
- Modify the original output based on reflections, initiating another iteration.
Initial Code Generation
- The first task involves generating an implementation of the merge sort algorithm using Roock and specifically leveraging the Llama 3 model (70 billion parameters).
- Environment variables must be set up correctly, including defining a Gro API key for functionality.
System Prompt for Code Generation
- A system prompt instructs the model to act as a Python programmer tasked with producing high-quality code and responding constructively to critiques from users. This sets expectations for performance during code generation tasks.
Displaying Generated Code
- After executing the generation block, an initial Python implementation of merge sort is displayed, showcasing two defined functions:
merge_sortandmerge, along with example usage demonstrating its functionality.
Reflection Step on Generated Code
- In this phase, feedback is provided by simulating Andre Carpathy's persona, critiquing aspects such as code organization, naming conventions, and performance considerations while suggesting improvements like additional test cases for robustness.
- The critique results in revised implementations that include docstrings and enhanced sample usage examples reflecting best practices in coding standards.
Iterative Improvement Process
- Following feedback incorporation into the generation chat history allows for further iterations where improved versions of code are generated based on previous critiques.
- This iterative process can continue indefinitely; however, it has been streamlined through implementing a dedicated Python class designed for better reflection management within this context.
Reflection Agent Implementation
Overview of Reflection Agent
- The reflection agent is imported from the
agentic patternslibrary, and installation can be done via Poetry for dependency management.
- The class defines a
generatemethod to produce code and areflectmethod that critiques the generated output, enhancing understanding through feedback.
- The main entry point is the
runmethod, which takes various prompts and parameters to manage iterations effectively.
Functionality of the Reflection Agent
- It generates history lists for both generation and reflection while iterating through defined steps, utilizing a step tracker for visual outputs.
- Demonstration includes importing the agent, defining prompts, and running it to observe iterative outputs in different colors for clarity.
Tool Pattern Exploration
Introduction to Tool Pattern
- Transitioning to discussing the tool pattern, emphasizing its practical use in frameworks like LangChain or LlamaIndex without focusing on specific implementations.
Understanding Tools Underlying Mechanism
- A tool allows an LLM (Language Model) to access external information beyond its stored weights by executing functions or APIs.
- Tools serve as Python functions enabling LLMs to fetch relevant data from outside sources when internal knowledge is insufficient.
Example of Tool Functionality
- A simple Python function example illustrates how tools can retrieve current weather data based on location input.
Understanding Function Integration with LLMs
Introduction to the Function and Its Purpose
- The function is designed to return a temperature of 25° C for Madrid, illustrating a basic example of how a tool operates.
- The function outputs a dictionary containing temperature data when called with specific parameters (location: Madrid, unit: Celsius).
Making Functions Accessible to LLMs
- To enable an LLM (Language Model) to understand and utilize the Python function, it requires input in text format.
- A system prompt is proposed that instructs the LLM to behave as if it were calling a function, including relevant information within XML tags.
Structure of System Prompt and Tool Call
- The expected output from the LLM includes the name of the function and its arguments formatted within XML tags.
- Each tool's information provided includes its name, description, parameters, and their types (e.g., strings for location and unit).
Implementation Using Different LLM Versions
- The session utilizes a different version of the Llama model fine-tuned for tool use compared to previous examples.
- A constant system prompt is defined in code; an inquiry about current temperature in Madrid is made using this prompt.
Processing Output from the LLM
- The output structure returned by the LLM matches expectations but needs further processing to be usable.
- A dedicated function processes this output by removing XML tags and converting string dictionaries into proper Python dictionaries.
Finalizing User-Friendly Responses
- After obtaining results as a Python dictionary, additional steps are taken to format responses more naturally for user interaction.
- The final response aims to present information clearly (e.g., "The current temperature in Madrid is 25° C") rather than as raw data.
Dynamic Tool Integration in Python
Overview of Dynamic Function Signature Extraction
- The speaker discusses the limitations of a step-by-step approach for generating function signatures, emphasizing that it is not scalable for larger projects.
- A need arises for an agent to automatically extract function signatures from Python functions and manage tool selection without user intervention.
Introduction to New Modules
- The speaker introduces new modules added to the repository, specifically within the "tool pattern" folder, which includes three key components:
tool agent,tool, andutils.
- Focus shifts to the
toolmodule, which implements a method for extracting function signatures from Python functions.
Tool Class Implementation
- The
toolclass is defined with three attributes: name, function, and function signature. The signature is generated by a specific method while the function attribute holds the callable Python function.
- A decorator is introduced that transforms a standard Python function into a tool object by generating its signature and defining necessary attributes.
Tool Agent Functionality
- The
tool agentcan utilize multiple tools; it selects the appropriate one based on user queries and retrieves relevant information in natural language.
- Key attributes of the tool agent include generating client connections and defining a list of tools available for use during operations.
RAM Method Process
- The RAM method processes user messages by converting them into prompts using OpenAI's API, followed by generating chat histories for both tools and agents.
- It initiates a call to generate logic based on previous implementations, ultimately leading to retrieving tool call information.
Validation and Execution Steps
- After obtaining tool call details, validation checks ensure that argument types match expected input types before executing the selected tool.
- Results from running the tool are appended to chat history using observation prompts before making another call to retrieve final outputs.
Conclusion Preview
- The speaker hints at demonstrating how all these classes work together effectively in practice as part of implementing everything correctly.
Implementation of a Tool Decorator
Overview of the Implementation
- The implementation is not intended to be a perfect framework but aims for clarity and ease of understanding. The focus is on creating something functional without overcomplicating it.
Fetching Top Stories from Hacker News
- A function has been implemented to fetch the top N stories from Hacker News, a popular platform for sharing articles and links. This function serves as a practical example rather than using dummy data.
- Hacker News features various types of stories, including articles, GitHub repositories, and tweets, making it widely used by many people. The goal is to retrieve a specified number of these stories effectively.
Transforming Function into Tool
- Demonstration begins with running the Python function to ensure it works correctly; fetching the top five stories confirms functionality with accurate results displayed on Hacker News.
- The transformation process involves utilizing previously covered methods to convert this Python function into a tool that can be accessed programmatically, complete with generated descriptions and parameter information.
Agent Interaction with Tools
- To instantiate the tool agent, only one tool (the hn tool) is required in this case. Testing begins by asking unrelated questions to confirm that the agent does not engage tools unnecessarily when they are irrelevant to user queries.
- When prompted about current top stories on Hacker News, the agent successfully utilizes the tool and returns understandable results instead of raw output from the function call itself. This demonstrates effective interaction between user input and tool usage within the agent's design.
Understanding React Technique in Planning Patterns
Introduction to React Technique
- The session shifts focus towards implementing a react agent using a technique known as REACT (Reason and Act), which differs from traditional front-end development frameworks associated with "React." This method aims at enhancing planning capabilities in language models (LLMs).
Simplified Explanation of REACT Process
- A diagram illustrates how REACT operates through three main steps:
- Action Step: The agent decides to use a specific tool.
- Observation Step: It observes outputs generated by that tool.
- Reflection Step: Finally, it reflects on observations made before deciding whether to continue iterating or conclude its response process based on what was learned during execution.
React Technique Implementation Overview
Introduction to the React Technique
- The session begins with an overview of the technique known as "The Loop," which will be implemented in a Jupyter notebook.
- A brief introduction to the planning pattern and links to previous lessons are provided, emphasizing their relevance for today's implementation.
Relevant Imports and Setup
- The focus is on importing necessary Python libraries and the Gro client, highlighting that no external frameworks like Lama index or Lanch will be used.
- Introduction of utility functions, including a tool decorator from previous videos that transforms Python functions into tool objects, and a new function called
extract_content.
System Prompt Configuration
- Discussion on configuring the system prompt for implementing the react technique, which involves instructing the language model (LLM) to operate using a loop structure: thought, action, observation.
- The importance of defining available tools within this system prompt is noted; these tools will be completed later in the process.
Example Session and Loop Execution
- An example session is introduced where the react loop is defined. This includes asking a question enclosed by XML tags.
- Explanation of how LLM processes thoughts about questions before selecting tools based on observations made during execution.
Handling Responses and Constraints
- The response message format is discussed; if it returns a final answer (e.g., temperature), it indicates that the loop can stop.
- Additional constraints are added for unrelated queries where LLM should respond freely without invoking any tools.
Tool Creation and Implementation in React System
Overview of Tools Created
- Three basic tools were created:
- Sum Two Elements: Accepts two integers, A and B, and returns their sum.
- Multiply Two Elements: Multiplies the two integers A and B.
- Compute Logarithm: Computes the logarithm of an integer input X.
Defining Tools with Decorators
- Tools are defined using a tool decorator that transforms Python functions into tool objects.
- A dictionary is created to relate tool names with their respective tool objects, verifying functionality by accessing attributes like name and signature.
Integrating Tool Signatures into System Prompt
- The signatures for all three tools are concatenated into a variable for integration into the system prompt.
- The system prompt is printed to confirm successful addition of tool signatures.
User Interaction and Chat History Setup
- The implementation process begins with user interaction where a question about calculations is posed.
- Chat history is generated using the system prompt followed by the user's question formatted in XML tags.
Execution of First Tool Call
- The first message indicates that the agent needs to calculate sums before proceeding to multiplication and logarithm.
- The agent correctly identifies that summing numbers should be prioritized based on user instructions.
Extracting Tool Output for Further Processing
- An extraction method retrieves content from XML tags, specifically targeting string dictionaries returned by tools.
- Assertions are made to ensure that results match expected mathematical outcomes, confirming correct execution without errors.
Observations After Executing Actions
- Following each action, observations are recorded in chat history using appropriate XML tags. This step ensures feedback loops within the agent's processing logic.
Next Steps in Calculation Process
- After obtaining the sum result, the agent plans to multiply it by five as per user instructions.
- Arguments for this operation are set up logically based on previous outputs, demonstrating coherent thought processes within the agent's operations.
Understanding the React Agent Implementation
Step-by-Step Execution of Mathematical Operations
- The speaker discusses running a multiplication check using the tool, specifically multiplying 6912 by 5, and confirms that everything is functioning correctly.
- After confirming the multiplication result, the next step involves calculating its logarithm. The tool call for this operation is
compute log, with arguments based on previous calculations.
- An assertion is made to ensure that the mathematical computation aligns with the tool's output, emphasizing the importance of maintaining observation XML tags throughout the process.
- In step seven, upon completion of all calculations, an XML tag indicating a response is generated, confirming that the loop has concluded successfully with a logarithm result of approximately 10.45.
Transitioning to Automated Implementations
- The speaker highlights that manually implementing each iteration isn't efficient; thus, they will demonstrate how to automate these processes in VS Code.
- Instructions are provided on navigating to the source folder in VS Code and locating the relevant agent patterns folder for planning pattern implementation.
Exploring Code Structure and Logic
- Within the code module for the react agent, familiar elements such as system prompts from previous implementations are noted as being consistent across platforms.
- Key differences in code structure are introduced; attributes like RW client and model settings are defined as class attributes rather than inline variables used previously.
Core Functionality and Validation Mechanisms
- The method
add tools signatureis discussed as being more refined compared to earlier notebook implementations, showcasing improved efficiency in handling tools.
- A detailed explanation of processing tool calls reveals it validates arguments before executing them—this prevents errors when incorrect data types are inputted.
Finalizing Agent Logic and Loop Control
- The observations variable plays a crucial role in linking back to initial diagrams presented at the video's start, ensuring clarity in results collection.
- The run method common across various agents includes an internal loop specific to react logic. It checks for response XML tags to determine if further processing is necessary or if it can terminate early.
Implementation of the React Technique
Overview of Tool Calls and Chat History
- The tool call process is similar to the thought TX, as demonstrated in a Jupyter notebook. After executing a tool call, chat history is updated with the assistant's completion.
- The system checks for any tool calls in the completion; if found, it processes these calls using a previously explained method.
- Results from tool calls are stored in an "observations" variable and printed in blue for clarity, aligning with the REACT diagram discussed earlier.
Loop Logic and Completion Handling
- The chat history is updated again with user observations after processing tool calls. This initiates another iteration of the loop.
- Two outcomes can occur: finding a response XML tag or reaching maximum defined rounds without finding one. If max rounds are exceeded, the current chat history completion is returned.
- The logic implemented is straightforward—a simple loop comprising about 20 lines of code that effectively executes the REACT technique.
Demonstration in Jupyter Notebook
- A demonstration begins by importing and instantiating the react agent with three tools before running user messages to observe results.
- Initial outputs show improved readability through color coding; first thoughts involve calculating sums and multiplying results sequentially.
Observations from Calculations
- The first calculation involves summing numbers (1 to 8), resulting in 6912. Subsequent thoughts lead to multiplying this result by five, yielding 34560.
- Finally, computing logarithms on previous results leads to an output of approximately 10.245, confirming successful execution of calculations step-by-step.
Understanding Multi-Agent Patterns
Introduction to Multi-Agent Pattern
- Transitioning into discussing multi-agent patterns which divide tasks into smaller subtasks handled by different agents—each assigned specific roles like software engineer or project manager.
Approach and Framework Utilization
- Each agent focuses on solving simpler tasks contributing towards completing larger objectives collectively as part of a multi-agent application framework.
Comparison with React Technique
- Unlike single-agent systems where one agent utilizes multiple tools (as seen in REACT), here agents are simplified and focused on distinct tasks.
Resources for Further Learning
- Mentioned frameworks such as Crew AI may be unfamiliar; prior videos provide additional insights into their functionalities within multi-agent applications.
Multi-Agent Framework Implementation
Introduction to Multi-Agent Pattern
- The speaker discusses the intention behind the videos, aiming for a more elaborate implementation of the multi-agent pattern from the start, rather than a simple version in Jupyter Notebook.
- A minimalist version of "crei" is being developed, inspired by key abstractions in CI, particularly focusing on agents and crews.
Inspiration from Airflow
- The design philosophy of Airflow is referenced, specifically its use of right shift and left shift operators to define task dependencies. This concept will be adapted to define dependencies between agents instead.
- Instead of using tasks as in Airflow, the implementation will focus on defining a crew session with various agents and their interdependencies.
Overview of Jupyter Notebook Structure
- The introduction includes links to previous lessons covering reflection patterns, tool patterns, and planning patterns before diving into the agent class within this mini crew AI framework.
- The agent class is essential for building a multi-agent framework; it serves as a foundational component for implementing agent functionalities.
Agent Class Implementation
- In VS Code, two modules are introduced:
agentandcrew, starting with an exploration of the agent module's imports which include previously discussed abstractions like tools and react agents.
- The initialization method requires parameters such as agent name, backstory (similar to system prompts), task description, expected output format, list of tools defined earlier in the series, and selected LLM model (Groc model).
Attributes and Dependencies
- Key attributes include dependencies (agents that this agent relies on) and dependents (agents that rely on this current agent). If dependencies fail, then this agent cannot execute its task successfully. Context refers to outputs from prior agents that inform current operations.
- A unique feature involves registering an agent within an active crew session; this registration process ensures proper management of multiple agents working together effectively within defined sessions.
Understanding Agent Context and Dependencies
Overview of Shift Methods
- The discussion introduces various shift methods, including right shift and left shift operators, which are essential for defining dependencies between different agents in the framework.
- These operators are borrowed from Airflow philosophy, emphasizing their role in establishing relationships among agents.
Dependency Management
- The right shift and left shift operators facilitate the definition of dependencies. However, the speaker notes a personal preference for using the right shift operator over the left.
- The
receive_contextmethod is crucial for receiving outputs from dependent agents. If no dependencies exist, it initializes with an empty string.
Creating Prompts
- A prompt is necessary even when dealing with reactive agents. It comprises several elements: test description, expected output, and context.
- The context attribute is linked to what was discussed earlier regarding
receive_context, ensuring that all relevant information is included in the prompt.
Running Agents
- The main method (
run) generates a prompt usingcreate_promptand executes the reactive agent with this prompt.
- After execution, outputs are passed to dependent agents through a loop that calls
receive_context, ensuring proper allocation of results.
Creating Agents: Examples and Implementation
First Agent: Poet Agent
- An example agent named "poet agent" is introduced. Its backstory involves being a well-known poet tasked with writing about life's meaning.
- The expected output specifies that only the poem should be returned without any titles or introductory sentences.
Second Agent: Writer Agent
- A more complex example involves creating an agent equipped with tools. This "writer agent" specializes in writing text into
.txtfiles.
- Utilizing a tool decorator similar to those found in other frameworks (like Lante), this agent's function writes strings to specified file names.
Tool Functionality
- The tool function defined as
write_string_to_txttakes two parameters—a string and a filename—and performs its task by writing content into a file.
Multi-Agent Pattern and Dependencies
Overview of Tool Usage
- The tool discussed is designed to convert strings into text files, specifically using a tool called "dict" for this purpose. The output file is named "tool agent sample txt," demonstrating the successful generation of a text file containing the phrase "this is a tool agent."
Defining Agents and Dependencies
- Two agents are defined: a poet agent and a poem translator agent skilled in ancient Greek. The translator's task is to translate poems into ancient Greek, with the expected output being solely the translated poem.
- Agent one (poet) serves as a dependency for agent two (translator), meaning that the translator relies on the output from the poet. This relationship establishes an important aspect of multi-agent patterns.
Contextual Output Between Agents
- When running agent one, its output becomes context for agent two, allowing seamless interaction between them. For example, if agent one outputs lines about "whispering winds," this context is passed to agent two for translation purposes.
- The system prompt used by agent two incorporates this context, ensuring it understands what was generated by the previous agent before proceeding with translation tasks.
Crew Class Functionality
- The crew class acts as an orchestrator within this framework, managing multiple agents through methods like
enterandexit, which define session contexts where agents operate together effectively.
- A key attribute of the crew class is
current crew, which tracks active contexts and ensures that agents are registered correctly within their operational environment. This helps maintain organization among various agents during execution.
Managing Agent Execution Order
- To manage dependencies without conflicts during execution, a topological sort algorithm organizes agents based on their dependencies before they run. This prevents any issues arising from inter-agent dependencies during operation.
- Visualization tools such as graph-based plotting will be utilized later to illustrate how these agents interact within their directed graphs, enhancing understanding of their relationships and execution order in practice.
Poem Generation and Translation Process
Overview of the Agents Involved
- The process involves three main agents: the poet agent, the translator agent, and the writer agent. Each plays a distinct role in generating, translating, and writing a poem into a text file.
- A diagram illustrates the relationships between these agents, showing how each output serves as input for the next. This visual representation simplifies understanding their dependencies.
Implementation Insights
- The speaker notes that using diagrams to represent dependencies is clearer than diving into code implementations, which can be complex and messy. This approach aids in grasping how each component interacts within the system.
- The final step in their Jupyter notebook involves executing code to see how well each agent performs its task sequentially: generating an English poem followed by its translation into Spanish.
Poem Generation and Translation
- The first output from the poet agent is an English poem titled "In the Grand Tapestry of Time and Space," showcasing creativity before moving on to translation tasks.
- After translation, it is confirmed that the Spanish version accurately reflects the original poem's meaning, demonstrating effective language processing capabilities of the translator agent.
Final Output Creation
- The writer agent takes over by utilizing a tool to write the translated Spanish poem into a text file named
poem.txt, completing this multi-agent collaboration process successfully.
- The speaker expresses enthusiasm about this lesson as it encapsulates practical applications of previously learned concepts regarding agents working together to accomplish tasks efficiently, even if they are relatively simple in nature.