How to Think About Memory in AI Agents (ft. Richmond Alake)

Name: How to Think About Memory in AI Agents (ft. Richmond Alake)
Uploaded: 2026-01-20T15:24:26.000Z
Duration: 1 h 50 min 44 s

Understanding Agent Memory in AI

Introduction to the Discussion

Andrew Ang introduces Richmond Alake, focusing on the question of how to model memory in AI agents, which has been a significant point of exploration for Richmond.

Richmond's background includes experience in both AI and database systems, emphasizing his unique perspective on data movement and storage within agent architecture.

Richmond Alake's Background

Richmond shares his educational journey, starting with a degree in software engineering followed by a master's degree focused on AI.

His career path includes roles as a web developer, computer vision engineer, machine learning architect, and currently as an advocate for AI developer experience.

Course Development with Andrew Ang

The course developed with Andrew aimed to address relevant topics for AI developers that would remain applicable over time.

Richmond highlights the importance of creating content that remains relevant years later, touching upon concepts like prompt compression and query optimization.

He notes that the course was designed to cover foundational ideas still being discussed in current research papers.

Defining Agent Memory

When asked about agent memory, Richmond explains it through relatable human experiences—emphasizing its importance for reliability and trustworthiness in interactions.

He uses the analogy of a travel agent forgetting client preferences to illustrate why memory is crucial for effective long-term task management in agents.

Technical Aspects of Agent Memory

In technical terms, agent memory involves conceptualizing data movement across key components: embedded models, external memory (like databases), and natural language models (NLM).

Understanding how data flows between these components is essential for developing capable and adaptable AI agents.

Understanding Memory Types in Agentic Systems

Overview of Memory Types

The speaker discusses various types of memory, including working memory and episodic memory, emphasizing the distinction between long-term and short-term memory.

Memory is categorized into two forms: long-term and short-term. Short-term memory can be likened to an agent's context window, while long-term memory resembles an external storage system like a database.

Short-Term Memory Forms

Within short-term memory, distinct forms exist such as working memory and semantic cache. Working memory serves as a temporary holding area for information.

A semantic cache allows for quick retrieval of previously stored responses based on incoming queries, enhancing efficiency by avoiding unnecessary trips to the neural network (NLN).

Long-Term Memory Insights

Long-term memory is crucial for making agents reliable and adaptable. It includes procedural, semantic, and episodic memories borrowed from neuroscience.

Procedural memory involves routines or skills that can be performed without conscious thought; this is analogous to how agents can implement workflow memories.

Workflow Memory in Agents

Workflow memory allows agents to store execution steps systematically (e.g., step one through three), enabling them to learn from previous actions.

This type of procedural implementation helps agents understand useful trajectories by storing outcomes associated with specific workflows.

Comparison of Caching vs. Database Storage

The discussion highlights the difference between caching (to save operational costs by reducing system trips for similar queries) versus storing data in a database.

Both systems are complementary; workflow memory units can be cached to optimize performance without conflicting with other storage methods.

Additional Types of Procedural Memory

The speaker briefly mentions additional types within procedural memory such as toolbox or skill box memories, which store specific skills or schemas relevant to agent functionality.

Understanding Memory Architecture in AI

Types of Memory in AI

The discussion begins with the classification of memory types: procedural memory, episodic memory, and semantic memory. Procedural memory involves skills and routines, while episodic memory captures conversational history and interactions.

Semantic memory is described as a knowledge base about the world, including information about people (entity memory). Other forms of semantic memory are acknowledged but not detailed.

Building Mental Models for Memory

The speaker emphasizes the importance of constructing mental models to architect effective memory systems. This marks a shift from simply appending conversations to developing sophisticated models that enhance continuity across sessions.

A key aspect of episodic memory is its ability to store past interactions with agents, allowing for continuity without needing constant reminders about preferences or styles.

Implementing Memory Engineering

The implementation of memory architecture leads into the discipline known as "memory engineering," which balances simplicity and complexity while adhering to engineering principles.

Database engineers focus on retrieval latency; similarly, AI engineers must ensure low-latency operations for agent systems. This highlights an intersection between database optimization and AI development.

Evolution from Prompt Engineering to Memory Engineering

The conversation transitions into the evolution from prompt engineering to context engineering, ultimately leading towards a focus on memory engineering or agent engineering.

The realization that prompt engineering alone is insufficient stems from discussions around how best to model AI's capacity for long-term interaction and learning.

Insights from Research on Agent Memory

A pivotal moment occurs when referencing a course by Andrew Ng where he posed critical questions regarding modeling memory in AI agents. This inquiry sparked deeper exploration into the topic.

Mentioned research includes generative agents developed at Stanford that utilized long-term memories effectively, enhancing task completion through weighted information based on relevance, importance, and recency.

This structured overview encapsulates key insights regarding different types of memories in AI systems and their implications for future developments in artificial intelligence.

Agent Engineering and Memory Units

Transition from Prompt Engineering to Context Engineering

The term "prompt engineering" is seen as limiting, failing to capture the complexity of the tasks involved in building systems.

The shift towards "context engineering" reflects a deeper understanding that optimal token selection is crucial for effective system performance.

Future considerations include ensuring efficient retrieval pipelines for optimal tokens, emphasizing the importance of security and data governance.

Understanding Memory Units

A memory unit is defined as a minimal representation of information within an agentic system, encompassing attributes like content, timestamp, and user/assistant roles.

Different types of memory (e.g., conversational vs. workflow memory) require distinct attributes to be captured in their respective memory units.

Memory units can change based on context; for instance, workflow systems may prioritize capturing steps and outcomes.

Key Components of Agentic Systems

Memory Manager

The memory manager encompasses all software engineering related to external memory storage, handling operations such as reading and updating memory units.

An example of a DIY open-source memory manager is mentioned but advised against using it in production environments.

Memory Core

The "memory core" refers to the primary database where most memory operations occur within an agentic system. It highlights that memory isn't confined to one part but distributed across the system.

Memory Management in AI Systems

Understanding Memory Components

The memory architecture consists of three components: an embedded model, a database (the memory core), and a large language model (LLM). The memory core is crucial for security, privacy, and retrieval mechanisms.

Developers are encouraged to build their own memory managers rather than relying solely on existing tools. This allows for customization based on the complexity needed.

Selecting a robust memory core is essential; it should accommodate future data types, particularly vectors which are gaining popularity.

Data Retrieval Strategies

Various databases like Superbase and traditional PostgreSQL offer different functionalities including vector retrieval and real-time capabilities.

Many concepts in AI may seem new but often have historical roots; for instance, vectors have long been part of mathematics but their application in data handling is evolving.

Robustness in Agentic Systems

At Oracle's database division, emphasis is placed on creating a robust memory core that can handle diverse data types such as JSON-like structures or knowledge graphs.

Managing multiple databases for different data types can lead to unnecessary complexity. A unified approach with one capable memory core simplifies development efforts.

Efficiency in Database Management

Using multiple databases incurs technical debt due to increased energy, time, and resource expenditure. A single comprehensive solution reduces this burden significantly.

Developers should focus on experimenting with machine learning techniques rather than complicating their database management systems.

The Concept of Agent Harnesses

"Agent harness" refers to the engineering processes surrounding agentic systems aimed at achieving reliable outputs. It emphasizes software engineering principles rather than being a novel concept.

The evolution of terminology reflects the growing understanding of tasks within software engineering—transitioning from prompt engineering to context engineering and now towards memory engineering.

Abstractions in software engineering simplify complex topics, enabling better architecture of intricate systems while ensuring reliability and scalability.

Harnessing Language Models: Understanding Complexity

The Role of Harnesses in Development

Developers often need to create their own harnesses for language models, as existing frameworks can be overly complicated.

Langraph allows for both simple and complex applications, emphasizing the importance of a harness that provides essential abstractions and functionalities.

Misconceptions About Language Models

A common misconception is that language models (LLMs) can handle memory effectively; however, they are essentially "frozen in time" based on their last training dataset.

Users often overlook the necessity of providing LLMs with structured information to enhance understanding and performance.

Context Engineering and Task Complexity

Frontloading concerns through context engineering can improve accuracy by simplifying tasks for LLMs, allowing them to focus on reasoning rather than selection.

Reducing task complexity by separating selection from reasoning enhances overall performance in LLM applications.

Structuring Information for Memory Types

Organizing information within the context window according to different memory types (episodic vs. semantic) can lead to better utilization of data by LLMs.

LLMs have an innate understanding of hierarchical information structures due to their training on internet data, which includes various markup languages.

Data Representation Choices

There is debate over the best way to represent data for different memory types; options include plain text or JSON formats.

Experimentation is crucial in AI development; engineers must build harnesses capable of accommodating various data structures like JSON or YAML.

Implications of Recent Research

The recent RLM paper introduces a method for handling long contexts through virtualized memory, prompting discussions about its implications on existing mental models regarding memory in AI systems.

Understanding Recursive Language Models and Context Offloading

Overview of Recursive Language Models (RLM)

The term "Recursive Language Model" (RLM) may imply a new variant of large language models, but it actually refers to an agent harness rather than an internal modification of the model itself.

RLM focuses on context offloading into a sandbox environment, which is a technique for managing long context tasks by engineering context effectively.

Context Offloading Techniques

The RLM paper highlights various methods to address long horizon tasks, allowing systems to process more information without necessarily improving learning or memory retrieval capabilities.

While RLM enhances processing capacity, it does not inherently provide cross-session memory or improve retrieval pipelines for memory units.

Memory Representation and Dependency Graphs

A significant insight from the RLM paper is its potential to create dependency graphs that represent hierarchical relationships in data, moving beyond simple procedural lists.

For example, legal contracts often contain self-referencing clauses; representing these as dependency graphs can enhance understanding and navigation through complex documents.

Associative Memory and Knowledge Construction

Associative memory plays a role in how we connect concepts; for instance, thinking about "Conor Ren" might evoke related ideas like "blood" or "roses."

The sophistication of RLM lies in its virtualized memory approach—storing data as Python objects rather than consuming entire documents into the model's context window.

Enhancements in Document Processing

By slicing legal documents into manageable parts using recursion, RLM allows focused examination of relevant clauses while maintaining contextual integrity.

This method reduces the context rot problem by intelligently selecting necessary parts of lengthy contexts for processing.

Challenges with Sub-Agent Systems

The integration of sub-agents within this framework raises questions about computational efficiency and latency due to potential over-generation of agents during task execution.

Effective management of sub-agent creation is crucial; limiting their number based on task complexity can mitigate resource strain while ensuring effective operation.

Understanding Memory Management in AI Systems

Key Concepts in Memory Management

The paper discusses stopping recursion at one layer deep, allowing only a single sub-agent call to manage complexity and control.

Enforcing synchronicity prevents asynchronous forces from overwhelming the system, which could lead to chaos.

Cost analysis shows that this method is often cheaper than traditional auto-completion methods, particularly at the 95th percentile under specific tests.

Challenges of Implementation

The authors acknowledge the need for guardrails in production environments to manage memory effectively.

There is no comparison with other techniques like ME Zero or Mem GPT; instead, it focuses on summarization and vector retrieval methods.

Performance Insights

The discussion highlights that while existing systems have been around for some time, they may not be directly comparable due to evolving methodologies.

Using base models without memory management will likely yield inferior results compared to more advanced systems.

User Interaction Considerations

A significant challenge arises from user unpredictability when inputting data into the system; diverse queries can complicate processing.

The paper suggests avoiding short prompts for complex tasks; instead, it recommends using long contexts where this approach excels.

Context Complexity and Decomposition

The authors propose decomposing context problems into two dimensions: length (token count) and task/document complexity.

High internal referencing within documents or codebases increases complexity, making effective memory management crucial for performance.

Summarization Techniques and Information Loss

Users expect a one-size-fits-all solution despite varying complexities in their inputs; this expectation poses challenges for developers.

Compaction can be viewed as summarization; however, there are risks of information loss during this process.

This structured overview captures essential discussions regarding memory management strategies in AI systems based on the provided transcript.

Understanding Memory Compaction in AI

Techniques for Information Compaction

Discusses the concept of compacting information through summarization and externalizing snippets into memory, allowing for reduced information loss.

Introduces "just-in-time retrieval" as a technique where agents can call context IDs from externalized memory to enhance efficiency.

Highlights the importance of externalizing tool calls and providing placeholders within the context to maintain clarity while reducing data clutter.

Emphasizes that compaction is not limited to summarization; it also involves managing memory effectively by using unique identifiers.

Future of Agentic Applications

Explores predictions about agentic applications in 12 months, emphasizing improvements in memory capabilities as a key development.

Predicts that future applications will be more personalized, enhancing user experience and interaction with AI systems.

Stresses the significance of security and privacy in building trust within these systems, particularly regarding customer data management at Oracle.

Enhanced Memory Systems

Suggests that future agentic systems will require sophisticated memory-focused architectures to foster trust among users.

Envisions specialized agents tailored for specific domains (e.g., health, finance), each accessing a shared core memory system for consistency across interactions.

Addressing Data Silos

Discusses the necessity of having one shared memory core to avoid data silos, which hinder effective model building and access across teams.

Points out current challenges with siloed data in enterprises and advocates for unified access to enhance collaboration and innovation.

Designing Future Agentic Systems

Recommends focusing on centralized memory design while allowing different harnesses for domain-specific applications to ensure flexibility without compromising data integrity.

Concludes with thoughts on how future experiences will be shaped by consistent data sources while maintaining unique interactions tailored to individual needs.