The Complete Agentic RAG Build: 8 Modules, 2+ Hours, Full Stack

Name: The Complete Agentic RAG Build: 8 Modules, 2+ Hours, Full Stack
Uploaded: 2026-01-28T18:45:35.000Z
Duration: 4 h 28 min 10 s

Introduction to Retrieval-Augmented Generation (RAG)

Overview of RAG and Its Importance

RAG remains essential for grounding AI systems in private company data, as pre-training LLMs on private knowledge is challenging.

Current context windows are insufficient for handling extensive queries, necessitating the use of retrieval-augmented generation.

Evolution from Naive RAG to Agentic RAG

The approach has shifted from naive RAG to agentic RAG, indicating a need for hybrid strategies beyond simple vector search.

New retrieval methods include text-to-SQL for structured data and text-to-cipher for graph data, enhancing flexibility in execution.

Building an Aenta Rag App

Project Collaboration and Resources

The session will involve building an Aenta Rag app using Claude Code, with resources available in a linked GitHub repository.

Participants can expect a feature-rich chat interface that employs advanced agentic RAG techniques by the end of the project.

Technical Stack and Flexibility

The build utilizes a vanilla Python backend without frameworks, React frontend, Superbase for vector search, and Dockling for document processing. Users can customize their tech stack as needed.

Both cloud AI models and local models are utilized; users can opt for air-gapped builds or cloud APIs based on hardware availability.

Course Structure and Audience Engagement

Target Audience and Learning Approach

No prior experience with Claude Code or coding is required; however, participants should be technically minded and eager to learn about APIs and databases.

The course emphasizes collaboration with Claude Code rather than complete delegation of tasks, fostering learning through active engagement during the build process.

Instructor Background

Daniel introduces himself as part of AI Automators alongside his brother Alan; they have extensive experience in developing production-grade RAG systems over the past year.

Philosophies of AI-Assisted Coding

Collaborative vs Delegative Approaches

Two philosophies exist: collaborative coding where users stay involved versus delegative coding where users walk away after setting parameters. This session focuses on collaboration to ensure learning throughout the process.

Project Specifications

Provided specifications outline tech stacks and features but leave room for participant input during planning stages with Claude Code to enhance learning outcomes.( t = 247 s)

Final Notes on Course Materials

Resource Availability

Additional code files from modules one and two are provided to assist participants if they encounter challenges during development.( t = 293 s )

Community members have exclusive access to remaining module code files; however, sufficient resources are available without them.( t = 314 s )

Introduction to Agentic RAG AI Coding Series

Overview of Local Architecture

The video introduces the first installment of the Agentic RAG AI coding series, encouraging viewers to subscribe for future updates.

The local architecture consists of a codebase utilizing cloud code and an Integrated Development Environment (IDE) for code visibility and modifications.

Front-End Technologies

React is employed as the primary toolkit for building interactive user interfaces, complemented by TypeScript for enhanced safety.

Tailwind CSS and ShadCN UI library are used for styling, providing pre-built components that enhance professional appearance.

Vite serves as the build tool, converting code files into HTML, CSS, and JavaScript comprehensible by web browsers.

Back-End Technologies

Python with FastAPI is chosen as the back-end framework due to its efficiency in AI development; it facilitates rapid communication between front-end and back-end.

Docker can be utilized for document parsing through Dockling, enhancing functionality within the application.

Local Server Configuration

The Vite dev server operates on localhost:5173, allowing real-time updates in the browser via hot reload when changes are made in the codebase.

The back end runs on a Uvicorn server accessible at port 8000; communication occurs over HTTP or SSE to stream messages effectively.

Database Integration

Superbase.com is utilized as a remote database service; it allows real-time data reflection in the browser upon any changes made.

Both local and cloud models are integrated into this project; LM Studio and various Quen 3 models are mentioned alongside OpenAI communications.

Production Architecture Overview

Deployment Process

Changes made locally are committed to GitHub, creating a remote version of the codebase that triggers builds or deployments based on infrastructure setup.

Front-end deployment can utilize services like Vercel or Cloudflare; configurations allow automatic builds upon new commits to repositories.

Accessing Deployed Applications

Users access applications via URLs such as rag.app; front-end servers deliver compiled files while back-end servers handle data requests using FastAPI.

Project Breakdown: Building an Agentic RAG Application

Project Structure

The project comprises eight modules detailed in a PRD document available in the repository.

Two main interfaces will be developed: a chat interface and an ingestion interface. Despite its simplicity, significant functionalities like conversation threads and memory management will be incorporated.

Creating a Chat Interface with Document Ingestion

Overview of the Project

The project involves creating an app shell linked with Supabase for document ingestion and file processing. This includes metadata extraction and various statuses of files as they are processed.

The initial phase consists of eight modules, starting with the creation of a chat interface that utilizes OpenAI's responses API. Users can opt for alternative APIs if preferred.

Module Breakdown

Module 1: Chat Interface Development

Focuses on building a chat interface and user authentication through role-level security in Supabase, ensuring data isolation among users. By the end, a basic chat setup will be operational.

Module 2: Data Ingestion Interface

Introduces data ingestion capabilities alongside backend flows using PG Vector within Supabase to save document embeddings effectively.

Module 3: Record Management

Develops a record manager to prevent duplication of files within the knowledge base, enhancing data integrity and organization.

Module 4: Metadata Extraction

Concentrates on extracting and filtering metadata to improve search accuracy across documents stored in vector databases, leading to better results during queries.

Module 5: Multifile Format Support

Integrates Dockane for multifile format support, utilizing its standard pipeline while allowing local or cloud-based VLM (Vector Language Model) usage based on available hardware resources.

Advanced Features Implementation

Module 6: Hybrid Search & Re-ranking

Implements hybrid search techniques combining semantic search with keyword searches, enabling more robust querying capabilities through local or cloud re-ranking models.

Module 7: Agent Tool Development

Builds additional tools for agents including web search functionalities and text-to-SQL capabilities while ensuring database-level security by restricting access to specific tables only. This prevents potential data loss from unrestricted access.

Module 8: Sub-Agent Creation

Develops sub-agents similar to Claude AI’s explore agent that allows different LLM (Language Learning Model) contexts for tasks like document analysis without cluttering the main agent's context window, thus maintaining focus on user queries.

Observability & Development Process

Langsmith is utilized throughout all modules for observability purposes; it tracks LLM calls, prompts passed, tool usage, and memory management crucial for system functionality and debugging processes.

AI Development Loop

The development process follows an AI dev loop involving PRD (Product Requirements Document) phases where features are planned using Claude Code’s planning mode before moving into build phases which include validation steps post-development iterations.

This iterative approach helps refine features based on testing feedback before committing changes to subsequent phases in the PRD cycle.

Version Control Practices

For simplicity in this course context, version control will utilize a single main git branch rather than feature branches; however, future discussions will cover formal version control practices including branching strategies and deployment across environments like dev, staging, and production.

Getting Started with Claude Code and IDEs

Essential Software for Project Setup

The first software needed is Claude Code, along with an Integrated Development Environment (IDE) for better visibility of files during project creation. Options include Cursor, Google Anti-gravity, or VS Code.

Source files necessary to kick off the project are available on GitHub. Users should clone the repository using the provided URL, which contains all setup markdown files required for Claude Code.

Cloning the Repository

The cloning process can be done in various ways; however, it is recommended to use the IDE itself. In this case, Cursor is used to clone the repo by selecting a destination folder.

After cloning, users can open a sidebar in Cursor to view files in the repository. This step is crucial for understanding code progression even without extensive coding knowledge.

Terminal Usage and Configuration

Users can open a terminal within their IDE folder and run Claude Code commands directly. However, there may be performance issues when using Claude Code in terminal mode.

It’s advisable to configure a status line that displays model usage and context consumption percentages. Monitoring these metrics is essential for effective context management while using AI coding tools.

Project Structure Overview

The global rules file (claude.md) outlines tech stack details and engagement rules such as avoiding frameworks and ensuring role-level security.

Planning functions will save plans into an agent plans folder with specific naming conventions. A development flow of planning, building, validating, and iterating is also described.

Progress Tracking and Documentation

A progress.mmd file tracks project status through stages: not started, in progress, or completed. This helps new agents quickly understand where they need to continue.

The Product Requirements Document (PRD) defines modules' scope and phases of build including app shell development and memory building tasks across different modules.

Commands Overview

Two main commands are highlighted: a build command that executes tasks sequentially while updating progress documentation; an onboarding command designed to introduce new agents to project context effectively.

Triggering the onboarding command allows scanning of project structure and git logs to establish current state before starting module one focused on building out the app shell with user management via Superbase.

Creating a Chat Interface with OpenAI

Overview of the Project

The goal is to develop a chat interface that utilizes OpenAI within the OpenAI playground, allowing users to manage and interact with files through a vector store.

Users have the option to use various models, including local ones or APIs like Gemini, depending on their preferences for building wrappers around responses.

Planning Module One

The planning phase begins by entering plan mode, which helps maintain context while launching a sub-agent for task management. This ensures efficient token usage.

Plans are saved in a designated folder for future reference, aiding onboarding of new agents who can review previous plans. This promotes continuity in project development.

Task Execution Strategy

The initial plan outlines medium complexity tasks involving backend setup followed by client and database configurations; breaking these into smaller tasks may be beneficial.

A decision is made to clear the session and start fresh due to high token consumption from prior activities, ensuring optimal resource allocation moving forward.

Building Tasks and Progress Tracking

Upon initiating the build command with a clean context, 14 tasks are created within the module, showcasing real-time progress as files are generated in an IDE environment.

Understanding how code components fit together is emphasized; users do not need deep knowledge but should maintain a mental model of overall structure and functionality.

Utilizing IDE Features

An IDE's agent mode can provide explanations about file structures post-backend completion, enhancing user comprehension of application evolution and task tracking capabilities.

Users can run commands selectively based on permissions set in a configuration file; this allows for controlled execution while maintaining oversight on command inputs during task progression.

Finalizing Tasks and Next Steps

After approximately 13 minutes of building tasks, updates are made to reflect progress; however, some validation steps were overlooked regarding environmental variables that need addressing before finalization.

Superbase and OpenAI Integration Setup

Setting Up Superbase Credentials

The project requires setting up environment variables, specifically for Superbase credentials. The speaker opts to use superbase.com for this project.

A free account on Superbase allows the creation of up to two projects, which may be temporarily disabled after a few days.

Essential keys include the Superbase URL, anon key, and service role key. Users are advised to utilize legacy API keys found in the API settings.

Acquiring Additional API Keys

An OpenAI API key is necessary for module one; alternatives like local LLMs can be used from module two onwards.

Langsmith API key is important for tracking LLM calls between the custom app and OpenAI. A free account setup is recommended.

Validation Steps and Project Management

The speaker discusses creating an environment file (env file) in the backend folder with applied API keys that remain hidden from view.

SQL migrations need to be run in the Superbase SQL editor; there’s a back-and-forth regarding project IDs and account logins during this process.

Optimization of Development Process

It’s suggested that sub-agents could have been utilized to perform tasks in parallel, potentially saving time during feature builds.

Token usage approaches 100,000 tokens; strategies are discussed for managing token limits effectively by clearing sessions when necessary.

Streamlining Future Interactions

Emphasis on starting fresh sessions with new agents while maintaining progress logs to avoid redundancy in efforts.

Discussion about creating startup scripts to automate service initiation processes, reducing repetitive manual commands during development.

User Authentication and Database Setup

Setting Up User Credentials

The speaker discusses encountering multiple permission prompts, indicating a need to add Playwright settings in Supabase.

A test user is created with a suggested password, and the speaker expresses willingness to share these credentials for testing purposes.

Validation of the user login flow is confirmed as successful, with threads visible on the left side of the application interface.

Database Migration and Chat Interface Testing

The migration process for database scripts is initiated, confirming that tables for threads and messages are successfully created in Supabase.

Initial tests of the chat interface show that responses from an OpenAI assistant are functioning correctly within the context window limits.

Manual Testing and Bug Fixing

Completion of module one validation leads to manual testing; messages between users and assistants are tracked effectively in Supabase.

Issues arise where further messages cannot be typed after sending one; this indicates potential bugs needing resolution.

Troubleshooting API Versions

Identifying Version Issues

The speaker notes that Langsmith tracing isn't showing expected results, leading to a discovery of using an outdated version of the OpenAI API.

An upgrade to the current OpenAI Responses API is initiated, emphasizing the importance of thoroughly reviewing product requirement documents before implementation.

Vector Store Creation

After linking to the updated OpenAI Responses API, a vector store named "CC rag" is created for file uploads.

Files can be added to this vector store; once uploaded, their IDs are linked back into cloud code configurations.

Service Management and Architecture Understanding

Restarting Services Effectively

The speaker decides to manually restart backend services rather than relying on automated commands due to frequent needs for service management.

A command script is identified for restarting all services efficiently while ensuring no background processes remain active.

Context Limitations and Session Management

Concerns about context limits arise as they reach 55%, prompting considerations for clearing sessions or onboarding agents again for bug resolution.

Multiple instances of agents running simultaneously lead to confusion during testing; adjustments are needed in service management scripts.

Final Thoughts on Productivity Hacks

Enhancing Workflow Efficiency

The discussion highlights productivity hacks such as running multiple agent instances concurrently to tackle different problems effectively.

Understanding Parallel Task Management in Code Development

Efficient Use of Time in Bug Fixing

The speaker discusses the importance of managing time effectively by working on multiple features or bug fixes simultaneously, rather than focusing solely on one task.

A specific example is given where a bug fix has been running for over 8 minutes, highlighting the need to track progress while also considering parallel tasks.

Troubleshooting and Validation

An error occurs when attempting to retrieve information about a refrigerator from the vector store, indicating a need for validation within the system.

The speaker initiates another instance of Claude to work on front-end changes while troubleshooting existing issues, demonstrating multitasking in development.

Front-End Changes and User Experience Enhancements

Suggestions are made for improving user experience, such as dynamically generating thread titles based on initial messages and implementing a loading icon during backend processing.

The speaker expresses frustration with ongoing tracing issues but continues to make front-end adjustments to enhance functionality.

Addressing Technical Challenges

The discussion shifts to unresolved tracing problems with Langmith, prompting the use of plan mode to explore potential solutions through multiple agents.

Consideration is given to legacy issues stemming from previous specifications that may affect current operations, emphasizing the importance of version checks.

Progress Updates and System Functionality

Successful implementation of dynamic titles and a stop button indicates progress in front-end changes; this enhances user interaction with the application.

Confirmation that the vector store is now accessible via the responses API allows users to query it effectively, showcasing improved integration.

Moving Beyond Current Limitations

Plans are discussed for transitioning away from OpenAI services towards more customizable options like open router or local AI servers for better control over data management.

Emphasis is placed on building foundational elements of a retrieval augmented generation (RAG) system using PG Vector within their database setup.

Finalizing Module One and Commit Practices

Despite significant progress, unresolved tracing issues remain. The speaker utilizes advanced debugging tools while continuing front-end modifications.

A successful stream response alongside registered traces marks completion of module one; however, there’s an acknowledgment of insufficient commit practices throughout development.

Initializing a Git Repository and Committing Code

Setting Up the Repository

The speaker discusses initializing a Git repository using a cursor command, emphasizing the importance of committing the codebase as part of the first phase.

It is highlighted that ENV files should not be committed, as they are ignored by Git. The speaker confirms that all changes have been successfully committed and pushed to GitHub.

Version Control Importance

The speaker introduces their personal private repository named "CC rag tutorial," noting its distinction from a public version linked in the description. They stress the significance of version control for potential rollbacks.

Pushing to GitHub ensures a remote copy exists, with privacy being emphasized as crucial.

Managing RAG Systems with Document Uploading

Document Management

The speaker mentions uploading product manuals into an OpenAI storage bucket, indicating they will work with these documents throughout the project.

A demonstration follows where specific appliance manuals are uploaded to test how well managed RAG systems can respond to user queries about installation processes.

Limitations of Managed RAG Services

The response from OpenAI's AI is described as slow but generally accurate when compared to manual instructions. However, discrepancies in step counts are noted.

Concerns arise regarding transparency in managed RAG services; users cannot see how information is retrieved from multiple documents.

Building Custom RAG Pipelines

Transitioning Away from Managed Services

The speaker outlines plans to develop an end-to-end retrieval augmented generation (RAG) pipeline independently, allowing for greater control over data management and model selection.

Cost implications of using managed services like OpenAI are discussed, including charges per gigabyte and API calls which could accumulate significantly over time.

Planning Phase Two: Enhancements and Features

Release Creation and Future Planning

After committing everything to git, a release tagged 0.1 is created for phase one. This includes source code and migration scripts necessary for setup.

Module Development Insights

For phase two planning, onboarding agents begins with exploring current integrations while managing token consumption effectively without blocking main operations.

Plans include creating an ingestion UI for file uploads similar to previous demonstrations, along with developing pipelines for automated file retrieval from various sources.

Key Considerations Moving Forward

Questions arise regarding embedding generation configurations and module implementation breakdown strategies as development progresses towards more complex functionalities.

Plan Execution and Parallelization

Overview of the Plan

The discussion revolves around creating a single plan that utilizes multiple general-purpose sub-agents to execute tasks efficiently. This approach aims to enhance productivity by running tasks in parallel.

The speaker emphasizes the importance of saving the plan into an accessible folder, indicating that manual copying and pasting is also an option for flexibility.

Analyzing and Modifying the Plan

A review of the plan reveals that certain elements can be executed simultaneously, specifically two builds at once, while noting dependencies between phases one, two, and three. This segmentation is crucial for speeding up processes.

Communication with Claw is mentioned as a method to refine the plan further before initiating execution. The command used to start the build process from this modified plan is highlighted as well.

Execution of Tasks

Initial attempts show that sub-agents were not utilized effectively; however, adjustments lead to successful parallel task execution with two agents working concurrently on different tasks. This marks a significant improvement in efficiency for module building.

Verification steps are taken post-execution using Playwright to ensure interface functionality remains intact after modifications have been made. Refreshing the interface shows new features like a documents tab being successfully integrated into the system workflow.

Document Ingestion and Processing

Uploading Documents

Testing begins with uploading various document types (text or markdown), confirming system restrictions on file types during ingestion processes. Successful uploads are tracked through processing feedback from the system interface.

After uploading a test document, metadata such as file name and chunk index are confirmed present within Superbase's database structure, indicating effective data handling capabilities post-ingestion.

Data Management Insights

Observations about user-specific storage paths reveal organized document management where each user has their own folder for uploaded files—this enhances security and accessibility across different user accounts within the application framework.

Deletion tests confirm real-time updates in both document tables and storage buckets when files are removed from user accounts, showcasing robust data synchronization mechanisms within Superbase's architecture.

Testing User Isolation and Data Security

Acceptance Criteria Review

A thorough check against acceptance criteria highlights areas needing attention: UI improvements for chat history storage and model provider selection configurations are identified as necessary enhancements moving forward in development phases.

Regression Testing Necessity

The need for regression testing suites becomes apparent due to potential risks associated with ongoing changes in code—tracking test cases will help mitigate issues arising from AI coding alongside human development efforts throughout project phases.

User Account Functionality Tests

Creating New Users

New user creation tests demonstrate successful isolation of data between accounts; documents uploaded under one account do not appear under another account upon logging in—a critical feature ensuring privacy among users' information.

Understanding Table Structures

Emphasis on understanding table structures reinforces its importance for developers; knowledge about how data is separated across tables aids in maintaining security policies effectively while managing authentication protocols.

This structured summary encapsulates key discussions regarding planning execution, document processing, testing methodologies, and user account functionalities based on timestamps provided in the transcript.

Project Update and Bug Fixes

Addressing UI Issues

The speaker discusses the complexity of handling long lists in a single prompt, suggesting that a plan may be necessary for extensive tasks.

A request is made to fix broken layout issues in the document tabs, with a screenshot taken for reference.

Module 2 Updates

The speaker mentions needing a configuration interface accessible from the user menu to select models and providers, which was specified but not implemented.

There are residual database columns related to OpenAI responses that need removal; this has been partially addressed in migration scripts.

Real-Time Document Management

An issue is identified where deleted documents remain visible until the page refreshes, indicating a need for real-time updates in the documents interface.

Plans are discussed to build a validation test suite covering all application features to ensure new changes do not break existing functionality.

Testing and Development Practices

The importance of early implementation of testing suites is emphasized to avoid monumental tasks later as feature sets grow.

Current testing methods using Claude's vanilla plan mode lack structured test-driven development; building out regression tests is prioritized.

Progress on Bug Fixes and Testing Suite

Four task agents are running bug fixes while also progressing on browser testing within the test plan.

Instructions will be added to claw.md for future agents regarding updating the test suite when new features are developed.

Finalizing Changes and Security Concerns

After completing bug fixes, it’s noted that real-time connections now work correctly, although some layout issues persist.

Concerns arise about API keys being stored as plain text in the database, highlighting security best practices that need implementation.

Validation Test Suite and API Key Management

Overview of the Validation Test Suite

The validation test suite has been completed, confirming that claw.md is updated. The focus is on agent validation and the full test suite.

There are concerns about how frequently to run these tests due to their potential growth into hundreds or thousands, which could impact usage budgets.

Security Issues and Global Settings

A plan is being developed to address security issues related to API keys stored as plain text in the database.

The proposal includes changing user settings to global settings to avoid conflicts with different LLMs and embedding models across users.

User Profiles and Admin Access

Discussion on creating a user profiles table where each new user automatically gets a profile, including an admin role for accessing settings.

An issue arises where the admin user cannot access the settings panel; debugging tools help identify this problem effectively.

Testing Functionality End-to-End

After setting up an API key, testing begins with a new chat feature that fails due to missing API keys.

Successful integration of an open router key allows for model selection; however, issues arise when trying to change embedding models due to existing chunks in the vector database.

Document Uploading Challenges

Attempting to upload documents reveals errors linked to hardcoded dimensions in the database schema.

Adjustments are made allowing for variable dimensions in embeddings, leading to successful document uploads with embeddings generated from selected models.

Local Model Testing

Local AI server setup is discussed, showcasing various models available for testing. A context window of 70,000 is set for optimal performance during document processing.

Samsung Electric Dryer Overview

Initial Impressions and Setup

The speaker discusses the Samsung electric dryer, noting its fast streaming capabilities during a demonstration.

A text file was uploaded for testing purposes, confirming that the system works locally using LM Studio.

Hardware Specifications

The setup includes an RTX 5090 GPU with 32 GB of VRAM, providing ample memory for processing tasks.

Context length is set to 70,000 tokens, which is at the limit of GPU memory capacity; adjustments can be made if needed.

Version Control and Module Development

Committing Changes

The speaker reviews changes in module 2 before committing them to master on GitHub, highlighting the importance of database migration scripts.

A tag (v0.2) is created for version control before pushing changes to GitHub.

Planning Module 3: Record Manager

Discussion shifts to planning module 3 focused on creating a record manager to prevent duplicate file uploads.

Emphasis on avoiding duplication in vector storage as it complicates retrieval processes.

Implementing Features in Module 3

Hashing Algorithm Necessity

A hashing algorithm is proposed to identify unchanged files and prevent duplicates from being stored in the vector store.

Incremental Updates Strategy

Plans include enabling incremental updates where only modified content will be reprocessed while preserving unchanged chunks.

Planning Module 4: Metadata Extraction

Parallel Development Considerations

The speaker considers running module 4 (metadata extraction) concurrently with module 3 due to their differing functionalities.

Metadata Filtering Implementation

Questions arise regarding how metadata filtering should be implemented; suggestions include an expandable detail panel per document.

Finalizing Plans and Testing

Structured Metadata Goals

The goal for module 4 is defined as extracting structured metadata during ingestion and ensuring retrieval supports metadata filtering.

Database Migration Needs

A database migration script will be necessary to accommodate new fields related to metadata extraction.

Interface Design Considerations

Discussion about creating an interface for admins to define metadata keys, including data types and requirements.

This markdown summary captures key insights from the transcript while maintaining clarity and organization through timestamps.

File Upload and Metadata Configuration

Testing File Upload Functionality

The speaker conducts a smoke test by attempting to upload the same file again, noting that the document was not updated due to an unchanged content hash.

An additional file is uploaded successfully, indicating that the upload functionality is working as intended.

Metadata Configuration Discussion

The speaker discusses how configurable the metadata should be, suggesting a database store with a JSON field instead of hardcoding values.

Plans are updated and saved in the global settings, indicating progress on metadata configuration.

Metadata Processing and AI Interaction

Document Processing Insights

After refreshing, the system processes a text file and generates an LLM title and summary while applying document type and topics.

The metadata schema within Superbase describes what can be included in these fields but may cause issues when fetching responses.

Error Handling in AI Queries

When querying about a Samsung electric dryer, no specific information is returned due to overly conservative metadata filters.

A database error occurs related to duplicate functions rather than agent calls; this highlights ongoing bug hunting efforts.

Refining Metadata Filters

Challenges with Metadata Filters

Tracing reveals that the AI triggered a search without passing any metadata filters, leading to potential overzealous filtering resulting in zero results.

The importance of system prompting is emphasized; agents need clear instructions on which metadata filters to apply under specific circumstances.

Adjusting Search Parameters

The speaker attempts to refine searches by specifying topics associated with documents but encounters issues with incorrect document types being passed.

This misalignment leads to no results being returned; adjustments are needed for both prompts and filter settings.

Finalizing Features and Planning Next Steps

Completion of Current Feature Set

With feature development nearing completion, there’s a call for creating git commits for outstanding work related to modules three and four.

Planning Future Modules

Transitioning into module five focuses on multiformat support, particularly PDFs using docklane for flexibility.

Module six planning begins concurrently; it will address hybrid search implementation alongside other features.

Understanding Docklane's Capabilities and Development Progress

Initial Setup and Dependencies

The discussion begins with the requirement of PyTorch as a dependency for Docklane, which is substantial at around two gigabytes. It also downloads machine learning models upon first use.

The speaker mentions using a gaming laptop with 8 GB of VRAM to run smaller machine learning models, while an AI server with 32 GB of VRAM will be utilized for VLM models.

Document Management Features

A query arises regarding the existence of a delete document feature in Docklane; however, it is clarified that there is indeed a delete button on the front end that effectively removes records from Superbase storage.

Module Development Updates

After taking a break, the focus shifts back to building out module five and planning module six. The process involves planning, building, testing, and bug fixing.

For module six, multiformat support is introduced with text extraction as the primary focus. The maximum file size supported is set at 50 MB.

Search Pipeline Enhancements

Plans for replacing pure vector search with a hybrid search pipeline are discussed. This includes integrating keyword full-text search alongside vector search capabilities.

An option to utilize Cohere’s public API for re-ranking is proposed along with local re-ranking model options to enhance performance.

Testing and Validation Processes

Hybrid search functionality has been implemented successfully. Testing reveals that multiple documents can now be processed simultaneously.

A successful retrieval of information about water supply from documents demonstrates effective integration of LLM-generated metadata within module five.

Reranking Functionality Insights

Configuration settings for re-ranking are examined; it was noted that reranking must be enabled to see its effects in results returned by queries.

Comparison between results before and after enabling reranking shows significant differences in output metrics like similarity scores.

Version Control and Future Testing Plans

The speaker plans to commit all changes made during development into GitHub under version tag 0.5–0.6 while preparing release notes for clarity on updates made.

Performance Observations

Initial tests on ingestion speed reveal some slowness when processing nearly 200 files simultaneously; potential improvements through asynchronous processing are considered.

Resource usage spikes indicate high RAM consumption due to running machine learning models concurrently with file uploads; discussions about optimizing processes or offloading tasks to a main server arise.

Document Parsing and Optimization Strategies

Utilizing Cloud Document Parsing Services

The speaker discusses options for document parsing, suggesting cloud services like Llama Parse or OCR services such as Mistral and Data Lab OCR for those who prefer not to handle it locally.

For local processing, optimization is crucial; the speaker notes that Python is currently using about 8 GB of memory.

Improving Document Ingestion Process

A conversation with Claude is initiated to explore ways to streamline the document ingestion process, aiming to enhance speed while conserving server resources.

Module 7 focuses on building a text-to-SQL tool for querying structured data, emphasizing the need for a web search fallback in an agentic RAG system.

Identifying Performance Issues

An analysis reveals issues such as loading files twice into memory and lack of concurrency limits on background tasks, which could lead to crashes.

Recommendations include implementing optimizations and documenting them in the agents' plans folder while triggering multiple sub-agents to expedite processes.

Testing Performance Improvements

The speaker suggests adding a "delete all" button to improve efficiency and considers restarting the server as a precautionary measure before reloading files.

Observations indicate improved performance with documents now loading in batches rather than all at once, reducing server strain.

Analyzing Bottlenecks in Processing

The speaker aims to identify bottlenecks within the pipeline—whether they stem from LLM calls or Dockling side operations—and notes consistent processing behavior with three items always being processed concurrently.

After adjusting settings to allow ten concurrent ingestions, there’s slight improvement noted but still room for further iterations and investigation into underlying issues.

Module 7: Building Text-to-SQL Tools

Complexity of Adding New Tools

Module 7 involves medium complexity due to integrating multiple new tools following established patterns.

The discussion includes concerns over allowing arbitrary SQL generation which could expose sensitive user data; thus, predefined parameterized queries are recommended instead.

Implementing Database Query Solutions

Options are presented between using parameterized database queries versus raw SQL access. If raw SQL is chosen, strict read-only permissions will be enforced at the database level.

User Access Control Measures

A specific user account will be created with read-only access limited to one table (e.g., order history), preventing unauthorized actions like deletion across other tables.

Design Patterns for SQL Agents

Reference is made to Allan's video on SQL agents that covers various design patterns beyond just text-to-SQL implementations.

Module 7 and 8 Overview

Module 7 Plan Update

The updated plan for Module 7 involves implementing raw SQL with database-level security.

A link to the detailed plan will be provided, and the build command is initiated to kick off the process.

Introduction to Sub Agents in Module 8

In Module 8, a focus on sub agents is introduced, allowing them to load entire documents into context for better insight extraction. This addresses limitations of traditional RAG systems that often miss full document context.

The goal is to keep the main agent focused on user queries while enabling sub agents to conduct research or analysis as needed.

Development Considerations

The implementation of nested tool calls in the UI will require careful planning due to its complexity from both backend and frontend perspectives.

An exploration of the codebase is encouraged, even for non-coders, to understand how different components fit together within the system architecture. This includes separate folders for backend (Python) and frontend (React).

Backend and Frontend Structure

The backend consists of an app folder containing models, routers, services, and a database layer; understanding these elements aids in feature development and bug fixing.

The frontend structure includes Node modules for localized environments and source files organized into components, hooks, pages, etc., which are essential for building out the React application.

Finalizing Plans for Module 8

The implementation plan for sub agents includes creating a hierarchical agent architecture where main agents can delegate tasks like document analysis to isolated sub agents with specific capabilities. This will enhance user interaction by providing visible reasoning during analysis processes.

Opportunities for parallel execution of backend and frontend tasks are identified as part of this complex change initiative; updates will be made accordingly in the project plan.

Testing Phase Initiation

Before triggering builds or tests on new features from Modules 7 and 8, it’s important to clear existing contexts due to resource usage concerns (36% used already). Manual testing procedures are set up following server restarts.

A request is made to implement dark mode functionality on the front end as part of user experience improvements alongside ongoing testing efforts related to database interactions such as text-to-SQL functionalities using dummy data created earlier in sales data setup.

Sales Database Query Analysis

Overview of Sales Data Queries

The sales database query for "Metro Office" returned 12 rows, confirming the accuracy of the data with a total amount of 1124 for 25 units.

A separate SQL query for keyboard sales showed a total of 2399 for 30 units, which also appeared correct. However, running the same SQL command yielded only a single value (3524), raising questions about data retrieval discrepancies.

Investigating Query Discrepancies

The speaker expressed confusion over why their manual execution of the SQL query returned only one row compared to the tool's response of twelve rows.

It was noted that an auto-testing feature in cloud code might be affecting results; context window limitations were mentioned as a potential issue needing resolution.

Debugging and Code Review

Upon reviewing Langsmith responses, it became clear that there was a fallback mechanism in place that ignored specific queries if they failed, leading to all data being returned instead of just sums.

The speaker identified that a migration file had not been applied to the database, which could be causing issues with data retrieval.

User Permissions and Security Concerns

There was discussion around user roles and permissions; specifically, having read-only access on tables may have contributed to problems executing SQL commands correctly.

The need for dedicated Postgres users with limited access was emphasized to prevent overengineering solutions while ensuring security at the database level.

Finalizing Changes and Implementations

Acknowledgment was made regarding previous missteps in suggesting overly complex solutions; simplicity in creating dedicated users with select access is preferred.

Plans were discussed to obtain an API key from Tavi for web search functionality while considering alternatives like local search engines.

Environment Configuration Adjustments

The process involved searching through codebase settings to locate where API keys should be stored; discussions included whether these should reside in ENV files or settings pages.

Finally, steps were outlined to drop broken functions from Superbase and create necessary environment variables for improved database connectivity.

User Authentication and Database Connection Setup

Password Management and User Access

A password is required for user access, which has been added to the SQL reader database URL string.

The speaker mentions that Claude is looking for the password, suggesting a need for direct action rather than reliance on others.

Database Connection Issues

There are issues with the Postgres connection; it was attempting to connect through the wrong port (5432 instead of 6543).

After updating the connection settings in the environment variables (ENV), security measures are enforced at the database level.

Querying Data from Sales Table

Successful SQL Queries

The total value of all orders placed by "metro office" is confirmed as 3,524.

A successful query retrieves all order data, showing 12 records available in response to a request.

Potential for Expansion

The speaker notes that while current queries work well, there’s potential for more complex operations as tables become normalized or denormalized.

Web Search Functionality Testing

Web Search Execution

A web search for weather information in Galway, Ireland is initiated but encounters rendering issues due to context window limitations.

Despite technical difficulties, the web search successfully returns relevant data about weather conditions.

Document Summarization Challenges

Document Retrieval Issues

An attempt to summarize a large document (three and a half megabytes, 38 pages long) fails due to metadata filtering constraints.

System Prompt Adjustments

The system prompt needs updates to improve how documents are searched and retrieved without overly conservative filters affecting results.

Enhancing Agentic Capabilities

Tool Call Strategy Development

There's an emphasis on developing a retrieval strategy within AI agents that allows them to execute multiple tool calls effectively.

Progressive Query Handling

The goal is for agents to build up responses progressively through tool calls similar to human-like reasoning processes.

30 Billion Mixture of Experts Model

Overview of the Model

The discussion revolves around a 30 billion mixture of experts model, which effectively utilizes only six or seven billion parameters at any given time.

This model is suggested to be nearing the limits of what local models can achieve in terms of performance.

Retrieval Strategy Development

A hybrid search was executed using product codes and NAICS, successfully retrieving relevant documents such as a 24-inch built-in oven.

An error occurred with Postgres not being able to retrieve responses, indicating potential issues with the sub-agent's functionality.

Testing and Debugging Challenges

Manual Testing Insights

The speaker is manually testing module 8 but encounters errors that trigger the sub-agent, highlighting complexities in functionality.

There is a need for more logging and debugging files to understand issues better, especially when agents communicate within the codebase.

Error Analysis

An error from the Subbase Python client indicates no content returned (204 status code), raising exceptions instead of returning null values.

Frequent commits to Git are emphasized as crucial for easier rollbacks during development.

User Interface Issues

Document Search Functionality

Observations reveal formatting issues where tool calls should appear sequentially below messages but do not function correctly.

Rendering problems on the chat interface are noted, particularly concerning how tool calls are displayed; improvements are needed for clarity.

Enhancements Needed

Transparency in search results is requested, similar to interfaces seen in other LLM applications like Claude and ChatGPT.

The speaker suggests replicating an effective layout from Claude that includes nested steps and clear thought processes during interactions.

Sub-Agent Functionality Exploration

Triggering Analyze Document Tool

Attempts were made to trigger the analyze document tool using file names; however, access issues arose due to incorrect ID mapping.

Mapping Issues Identified

It was determined that mapping should refer to file names rather than IDs based on metadata associated with chunks.

Weather and Document Analysis Issues

Current Weather Functionality

The current weather feature in Gway is not functioning as expected, showing no results despite attempts to check the weather.

Steps for displaying information are not appearing in the correct sequence; they should be positioned under the first sentence of the inference.

Server and Git Management

The speaker suggests using git checkout or git work trees to manage codebase clones, allowing parallel bug fixes without server restarts interfering with each other.

Emphasizes the importance of a verification loop to automate feedback processes instead of relying on manual input.

Error Handling and Tool Calls

A new error message indicates issues with document chunks in schema cache, highlighting potential lazy coding practices.

Despite tool calls disappearing after completion, there is an acknowledgment that saving tool call history is essential for tracking previous interactions.

Context Window Adjustments

The context window was mistakenly set to 4,000 instead of 70,000; adjustments are being made to accommodate larger data processing needs.

There’s a need for UI updates to display which tools have been called during conversations effectively.

Data Persistence and Configuration Options

Discussion about saving tool calls into a messages table for better tracking across sessions; currently, tool calls only exist within active sessions.

Proposes making configurable parameters like maximum tool rounds dynamic and ensuring LLM outputs responses before concluding on a tool call.

Superbase Migrations and Document Analysis

Testing Superbase Migrations

The discussion begins with a focus on ensuring that future code agents are capable of executing Superbase migrations effectively.

A document search is initiated to determine if an appliance can be used outdoors, but the process encounters a looping issue, prompting a switch to a different model for testing.

Model Comparisons and Features

The Quinn 3 32 billion model is tested, revealing that it provides "think tags," which were absent in the previous mixture of experts model.

An example of how think tags render in the application is shown, indicating improvements in visual representation but also highlighting some bugs related to multiple think tags.

Interface Bugs and User Experience

Issues arise with rendering multiple thought process bubbles; only one should appear at a time, leading to confusion during tool responses.

Observations indicate that while progress has been made, there are still significant bugs affecting user experience and interface functionality.

Analyzing Tool Responses

The conversation flow appears intact as the system processes user input correctly; however, duplication issues occur within the output from sub-agents.

The complexity of rendering outputs from both main agents and sub-agents leads to potential miscommunication in displaying results.

Debugging Rendering Issues

Switching back to OpenAI's GPT 5.2 reveals that rendering issues stem from duplicating outputs between main agent text and sub-agent text.

After further testing with local models, minor bugs around thought processes remain evident but show signs of improvement as adjustments are made.

Final Checks and Improvements

A final check confirms that while most functionalities work well, there are still small bugs regarding how thinking tags display after tool calls.

Detailed analysis shows three traces during processing: user questions, sub-agent actions, and missing thinking tags in traces despite being rendered visually on screen.

This structured approach highlights key discussions surrounding Superbase migrations and document analysis while addressing ongoing challenges with interface design and functionality.

Integration of Mixture of Experts Model

Setting Up the Model

The speaker discusses integrating a mixture of experts model, addressing previous issues with text duplication that may have stemmed from sub-agent tool calls.

Adjustments are made to the context length, setting it to 70,000 for optimal GPU performance while preparing to ask questions.

Document and Web Searches

The system performs document searches in response to specific queries about product details, demonstrating its capability to retrieve relevant information effectively.

A summary request for a built-in oven triggers a sub-agent that loads data into memory before providing an answer.

Completion of Modules and Future Plans

Module Development Status

Modules seven and eight are completed, featuring additional tools like sub-agents and dark mode.

The speaker notes the need for further testing before deployment, emphasizing the importance of managing production projects properly.

Deployment Considerations

Discussion on the necessity of structured release cycles including staging environments and rollback strategies if deployments fail.

The current state is described as an alpha version due to incomplete testing; smoke tests have been conducted but more validation is required.

Capabilities and Features Overview

System Functionality

The application now includes an agentic RAG (Retrieval-Augmented Generation) system with a comprehensive chat interface capable of handling various document types such as PDFs and PowerPoints.

It supports multiple functionalities including tool calls, markdown rendering, LLM swapping, embeddings configurations, rerankers, and web searches.

User Management

Emphasizes multi-user capabilities with login isolation through row-level security (RLS), although there are concerns regarding unrestricted access to sales data which needs policy adjustments.

Future Directions in AI Development

Testing and Improvements Needed

While significant progress has been made across eight modules, there remains much work ahead in refining features and ensuring readiness for production use.

Educational Resources Available

Encouragement to subscribe for future videos detailing ongoing developments; mentions over 30 existing videos covering advanced RAG systems design patterns among other topics.