DSPy: The End of Prompt Engineering - Kevin Madura, AlixPartners

Name: DSPy: The End of Prompt Engineering - Kevin Madura, AlixPartners
Uploaded: 2026-01-08T20:48:21.000Z
Duration: 2 h 26 min 13 s

Introduction to DSPI

Overview of the Session

The speaker welcomes attendees and introduces the topic of DSPI, encouraging questions throughout the session.

Acknowledges that while there will be some slides, the focus will shift to coding in the latter half. A GitHub repository is available for participants to download code.

Understanding DSPI

Defines DSPI as a declarative framework for building modular software, aimed at those who may not be full-time engineers.

Shares personal experiences using DSPI for various projects, highlighting its efficiency in quickly iterating applications and programs.

Practical Applications of DSPI

Code Demonstration and Use Cases

Mentions several use cases that will be demonstrated: sentiment classifier, PDF processing, multimodal work, web research agent, text summarization, and an optimizer with Jeepo.

Key Features of DSPI

Emphasizes how DSPI allows users to decompose logic into programs treating LLMs (Large Language Models) as first-class citizens.

Highlights structured outputs and guarantees about input/output types provided by DSPI's primitives.

Advantages of Using DSPI

Abstraction Level

Discusses how DSPI operates at a high level of abstraction compared to other frameworks like Langchain; it focuses on user intent rather than low-level details.

Program Development Focus

Stresses that users are building proper Python programs instead of merely tweaking strings or prompts; this leads to more robust software development.

System Design Philosophy

Systems Mindset

Quotes Omar KB (founder/developer), explaining that DSPI is designed with a systems mindset which helps encode user intent effectively while adapting to evolving model capabilities.

Flexibility and Control Flow

Notes that while retaining control flow within programs, users can switch between different models as needed without losing functionality.

Conclusion on Robustness and Alternatives

Convenience Without Compromise

Affirms that convenience comes naturally with using DSPI due to its design; it minimizes unnecessary parsing tasks while maintaining clarity in program structure.

Comparison with Other Libraries

Acknowledges other libraries such as Pedantic AI and Langchain but emphasizes unique aspects of DSPI’s approach.

Understanding DSPI: Key Concepts and Applications

Introduction to DSPI

The speaker emphasizes the importance of an open mind when exploring DSPI, suggesting that experimentation with code is crucial for understanding its functionality.

This talk focuses on practical applications of DSPI rather than exhaustive details, aiming to share personal experiences and solutions found through using DSPI.

Core Concepts of DSPI

The core concepts of DSPI are summarized into five or six key elements, which will be elaborated upon throughout the discussion.

Signatures define the desired function call by specifying inputs and outputs, allowing users to defer implementation details to the LLM (Large Language Model).

Modules and Tools in DSPI

Modules serve as logical structures within a program, containing one or more signatures along with additional logic. They are based on PyTorch methodologies.

In DSPI, tools are essentially Python functions that can be easily exposed to the LLM within its ecosystem.

Adapters and Their Role

Adapters act as intermediaries between signatures and LLM calls, translating inputs/outputs into a format suitable for prompts sent to the LLM.

There is ongoing research regarding optimal formats (e.g., XML, BAML, JSON), with adapters providing flexibility in choosing these formats.

Optimizers and Metrics

Optimizers are a notable feature of DSP but should not overshadow other functionalities; they enhance program structuring alongside signatures and modules.

Metrics work in conjunction with optimizers to measure success within a DSPI program by guiding optimization paths.

Signatures Explained

Signatures express declarative intent through simple strings or complex class-based objects (like Pydantic), where field names can serve as mini-prompts for models.

The naming of parameters is critical; intuitive names help models understand input requirements effectively.

Custom Prompts Integration

Users can incorporate existing effective prompts into their workflow without losing their value; this can be done via docstrings or direct string injection during prompt construction.

The ability to start from custom prompts allows users to build upon proven strategies while leveraging the structure provided by DSPI.

Understanding DSPI Modules and Their Functionality

Overview of DSPI Implementation

The speaker discusses the initial confusion regarding the shorthand version of implementing logic in DSPI, emphasizing that it allows users to defer complex implementations to the model.

A simple sentiment classifier can be created by providing text input and receiving an integer output for sentiment, with additional instructions clarifying the meaning of different sentiment values.

This shorthand approach facilitates quick experimentation and iteration without needing to create detailed prompts from scratch.

Modular Structure in DSPI

Modules serve as a foundational abstraction layer for DSPI programs, allowing users to build upon existing modules or create new ones based on effective techniques.

The design encourages composability and optimization, enabling logical separation of program components while integrating LLM calls effectively.

Built-in Modules and Techniques

The speaker mentions various built-in modules like DSpi.predict, which is a straightforward language model call. Other prompting techniques are also discussed but may not be as relevant today.

Examples include methodologies such as "chain of thought" prompting, which guides models through reasoning steps.

Tool Integration in DSP

React serves as a tool-calling interface within DSP, allowing Python functions to be injected into the model seamlessly.

The "Program with Thought" module enables models to reason in code, returning results while supporting custom Python interpreters for specific workflows.

Practical Application Example

An example illustrates how a module can ensure time entries adhere to formatting standards using defined signatures and business logic interspersed with LLM calls.

The implementation involves defining signatures at the top level and utilizing vanilla predict calls alongside hard-coded logic for practical use cases.

Web Tools and Adapters in LLMs

Overview of Web Tools

The speaker discusses the use of web tools, emphasizing a controlled approach by limiting operations to five rounds to prevent erratic behavior.

Introduction of adapters as prompt formatters that convert input signatures into specified message formats for better interaction with language models.

Understanding Adapters

Example provided on how a JSON adapter transforms a Pydantic object into a structured prompt for the LLM, showcasing input fields like clinical note type and patient info.

Clarification on the existence of a base adapter that serves general purposes while allowing customization for specific needs.

Performance Comparison

Reference to testing conducted by an individual named Pashant comparing JSON and BAML adapters, highlighting improved intuitiveness and potential performance gains (5-10%).

Emphasis on how switching from JSON to BAML can enhance information presentation without altering the underlying program structure.

Multimodality and File Handling

Multimodal Support in DSPI

Discussion on DSPI's support for multiple modalities including images and audio, facilitating easy integration into workflows.

Mention of an additional library called "attachments" designed to simplify file handling across various formats, enhancing usability with LLMs.

Practical Application Example

An example is given involving a PDF document where users can simply provide a link for processing without needing intricate setup or understanding of backend processes.

The speaker illustrates using RAG (Retrieval-Augmented Generation), asking questions based on documents fed into the system, demonstrating ease of use.

Optimizers in Model Performance

Introduction to Optimizers

Optimizers are introduced as powerful tools that may outperform traditional fine-tuning methods under certain conditions, particularly in context learning scenarios.

Benefits of Using Optimizers

Encouragement to experiment with optimizers before resorting to extensive infrastructure setups; they offer essential primitives for measuring and improving model performance quantitatively.

Transferability Through Optimization

Explanation of how optimizers enable transferability between tasks by allowing adjustments in model configurations (e.g., using 41 nano instead of 41), potentially reducing operational costs significantly while maintaining acceptable performance levels.

Understanding DSPI and Its Optimization Techniques

Overview of DSPI Functionality

The core function of the model involves iteratively optimizing prompts, enhancing performance through a structured program composed of various modules.

DSPI autonomously optimizes components within a program to improve input-output performance, emphasizing its role as a facilitator rather than an optimizer itself.

Clarifying the Role of DSPI

Omar clarifies that while DSPI is not an optimizer, it provides programming abstractions that allow for optimization as an added benefit.

Insights from Dwaresh and Carpathy highlight potential pitfalls when using LLM as judges due to their ability to exploit weaknesses in models.

Adversarial Examples and Model Performance

The discussion points out that LLM can identify adversarial examples which may lead to degraded performance if used improperly as evaluators.

Optimizers in DSPI leverage these vulnerabilities to enhance model performance by identifying areas needing improvement.

Program Construction and Metrics

A logical flow in constructing programs involves decomposing logic into modules and utilizing metrics to define operational contours.

Recent discussions indicate that current optimizers are performing on par with or exceeding traditional fine-tuning methods like GRPO, showcasing advancements in prompt optimization techniques.

Defining Success Through Metrics

Metrics serve as foundational elements for defining success criteria for optimizers, guiding them in assessing the impact of prompt adjustments on performance.

Examples illustrate how metrics can range from rigorous equality checks to more subjective evaluations based on adherence to defined criteria.

Building Complex Workflows with DSPI

The speaker emphasizes that DSPI equips users with essential primitives necessary for constructing complex workflows and data processing pipelines involving LLM integration.

Community Engagement and Practical Application

Encouragement is given to engage with online communities for further learning about the latest developments related to DSPI and its applications.

Transitioning into practical demonstrations, the speaker prepares to showcase code examples relevant to discussed concepts.

This structured overview captures key insights from the transcript while providing timestamps for easy reference.

Introduction to DSPI and Practical Applications

Overview of the Session

The session begins with a focus on practical applications of DSPI, encouraging questions and interaction as they explore various Python programs.

A configuration object is introduced for managing multiple language models (LMs), highlighting the need for different models based on workload complexity.

Utilizing Multiple Language Models

The speaker discusses using Open Router API keys to access three different LMs: Claude, Gemini, and 41 mini.

Each model's response is subjective; however, DSPI allows for defining answers more strictly by limiting options through literal definitions.

Caching Mechanism in DSPI

The caching feature in DSPI enhances performance by loading previous results quickly if no changes are made to signature definitions.

This caching capability is particularly useful for testing purposes, ensuring efficient retrieval of data.

Building a Simple Sentiment Classifier

Implementation Details

A simple sentiment classifier is demonstrated where text input determines sentiment scores based on predefined criteria (lower scores indicate negative sentiment).

Example inputs showcase how the classifier evaluates sentiments, illustrating its functionality despite seeming basic.

Signature Utilization

The importance of shorthand signatures in DSPI is emphasized; it simplifies passing parameters like strings and integers into prediction objects.

Usage Tracking and Document Analysis

Built-in Usage Information

DSPI provides detailed usage information per call, including token usage metrics that aid in observability and optimization efforts.

Document Handling Capabilities

An example involving document analysis showcases how attachments can automatically extract relevant content from PDFs without manual intervention.

Advanced Document Analysis Techniques

Creative Data Structures

The power of DSPI lies in its ability to handle complex data structures effortlessly due to its integration with Pydantic.

Deferring Structure Definition to Models

By allowing the model to define key-value pairs from documents autonomously, users can streamline their analysis processes significantly.

Document Analysis and Schema in DSPI

Overview of Document Analyzer Schema

The document analyzer schema is crucial for extracting important information such as filing dates, which defines the structure of the document analysis.

The output from the document schema includes key elements like filing date, form date, and form type, formatted for structured outputs.

Accessing Results and Inspecting History

Dot notation allows quick access to resulting objects from the document analysis process.

The "inspect history" feature provides a raw dump of system messages and input/output fields used during processing.

Response Format and Adapter Usage

Responses follow a specific format that includes various fields parsed by the DSPI (Document Structured Processing Interface).

An example using a BAML adapter demonstrates how to define simple models with patient details integrated into clinical notes.

LLM Calls and Context Management

Two calls are made: one using a smart LLM with built-in adapters, and another utilizing a defined BAML adapter.

Python's context management allows switching between different LLM definitions for specific calls without affecting global settings.

Comparing Outputs from Different Adapters

Both LLM calls yield identical outputs despite differing adapter usage; this highlights flexibility in model application.

Inspecting history reveals differences in prompt structures between JSON schema and BAML notation, with BAML being more comprehensible.

Multimodal Examples in Document Analysis

Image Analysis Use Case

A multimodal example involves analyzing street signs to determine parking availability based on time of day.

Reasoning Integration in Responses

By default, responses include reasoning when using DSPI.chain; however, changing to predict mode would exclude reasoning from results.

Module Structure for Logic Implementation

Modules encapsulate logic into replicable units; they can incorporate arbitrary business logic or control flows within their design.

Understanding Tool Calling in AI Modules

Core Logic Invocation

The core logic of the AI module is invoked when instantiated, demonstrating a simple example where an analyzer (AIE123) is called, and a counter increments with each call.

Function Definitions and Module Creation

Two functions are defined: perplexity search and get URL content, as part of creating a bioagent module that utilizes Gemini 25 as its LLM.

Async Functionality and Tool Calling

An answer generator object is created using React calls, allowing for tool calling. An async version of the function is also implemented to enhance performance.

Data Processing and Classification

The system loops through instances to determine if individuals have been at their companies for over ten years, utilizing tool calls for up-to-date information.

Debugging Insights

Similar to reasoning objects in chain-of-thought models, debugging insights can be obtained from tool calls, including arguments passed and observations made during execution.

Exploring Dataset Creation and Metrics

Dataset Overview

A dataset is created that categorizes various help messages (e.g., "my sync is broken") into classifications such as positive, neutral, or negative urgency levels.

Support Analyzer Module Development

Multiple modules are packed into a single support analyzer module which defines metrics based on the dataset's characteristics to evaluate message urgency effectively.

Performance Evaluation Process

The model's performance evaluation involves applying metrics iteratively to refine prompts based on feedback received from previous iterations.

Feedback Mechanisms in Model Training

Teacher Model Feedback Utilization

In Jeepa training, feedback from a teacher model provides textual insights on classification errors, enhancing the learning process by explaining mistakes made by smaller models.

Iterative Optimization Loop

This feedback mechanism tightens the iteration loop for prompt optimization by providing specific guidance on how to adjust responses based on prior inaccuracies.

Practical Applications of AI Processing

File Categorization Example

Demonstrates processing various file types (contracts, images), showcasing how clients can manage large datasets with unknown contents efficiently through classification methods.

Dynamic Model Routing

The implementation uses different models depending on file type detected; for instance, routing image requests specifically to Gemini models optimized for image recognition tasks.

Document Classification and Summarization Techniques

Overview of Document Processing

The speaker discusses different approaches to handling various types of documents, such as SEC filings and contracts, emphasizing the need for tailored processing methods based on document type.

A document classifier is utilized to predict the type of a file by analyzing images extracted from it, ensuring accurate categorization.

The classification process involves determining if the document is an SEC filing, patent filing, contract, or related to city infrastructure among other categories.

Model Functionality and Use Cases

The model's effectiveness in classifying diverse document types is highlighted; it can handle multiple categories without issues.

For contracts specifically, a summarizer object is created that processes each page recursively to extract key information and boundaries within the document.

Summarization Process

The summarization technique includes detecting boundaries within documents while generating summaries for better comprehension of content structure.

An example illustrates reliance on the model's ability to classify city infrastructure documents based solely on visual cues present in images.

Practical Application Insights

In production scenarios, more rigorous classification methods may be necessary, potentially involving multiple models for enhanced accuracy.

The summarization logic iteratively works through chunks of a contract to create concise summaries while also identifying sections like exhibits or schedules.

Boundary Detection Mechanism

A detailed explanation follows regarding how PDF pages are converted into images for classification purposes before being processed for boundary detection.

The output indicates clear demarcation between main documents and supplementary materials (e.g., schedules), showcasing effective boundary detection techniques.

Conclusion on Implementation Challenges

Discussion shifts towards challenges faced during implementation; asynchronous calls are made for classifying each image in a PDF effectively.

Emphasis is placed on improving code quality over time with ongoing development efforts aimed at refining these processes.

Understanding the Use of Structured Outputs in Machine Learning

Overview of Signature and Output Generation

The speaker discusses a signature that defines a tuple for input, which leads to generating a dictionary output with specific types (string, tuple, integer). This process is described as quick and efficient despite being non-production code.

The initial implementation has shown promising results during testing. There are opportunities for optimization and improvement, but the basic functionality works well.

Iterative Development Process

The model utilizes self-reflection by calling functions like get page images to verify boundaries within the data. This iterative approach helps refine outputs based on real-time feedback.

The discussion highlights how this method allows developers to leverage the model's introspective capabilities, creating a tight iteration loop between building and refining applications.

Enforcing Structure with ESP

While structured outputs are beneficial, they require careful coordination when integrating into broader programs. The speaker emphasizes that ESP is not the only way to create applications but offers unique advantages once understood.

Developers can quickly prototype applications using DSPI primitives, laying groundwork for scaling up to production-level systems.

Optimization Techniques in Machine Learning

The speaker shares insights on optimization algorithms like "my row," emphasizing that having well-structured data is crucial for effective machine learning outcomes. A smaller dataset (10 to 100 examples) can still yield significant improvements if metrics are intuitive.

An example of performance improvement from an entry corrector shows an increase from 86% to 89%. Metrics breakdown helps identify areas needing further refinement or adjustment in strategy.

Understanding Optimizer Outputs

Optimizers produce serialized modules that can be saved and reused later. These modules manipulate prompt phrasing based on performance metrics observed during iterations.

The optimizer dynamically adjusts prompts by identifying latent requirements not initially specified, enhancing overall model performance through learned adjustments based on data feedback.

Optimizing LLM Performance with DSPI

Utilizing AI to Enhance AI

The discussion begins with the concept of using Large Language Models (LLMs) to improve their own performance by dynamically constructing new prompts, which are then iteratively refined.

A question arises about why the solution object is not solely the optimized prompt, leading to clarification that while it can be accessed, there are additional components involved.

The speaker mentions the importance of understanding the DSPI object and its various elements beyond just the optimized prompt.

Exploring DSPIHub

Introduction of a tool called DSPIHub, designed as a repository for optimized programs where experts can share their optimizations for specific datasets or tasks.

An example is provided showing how an optimized program can be loaded and utilized, highlighting its output as a result of the optimization process.

Practical Applications of Optimized Programs

The speaker discusses practical use cases such as document classification, where an optimized classifier could identify specific types of documents from a large dataset.

Emphasis on flexibility in application; once an optimization is achieved, it can be repurposed across different projects or teams.

Feedback Mechanisms in Optimization

A question about real-time feedback mechanisms leads to discussions on continuous learning and how delayed user feedback could be integrated into optimization processes.

It’s noted that delayed metrics would need to be added back into the dataset for future optimizations, creating a feedback loop.

Cost Considerations in Using DSPIs

The conversation shifts towards cost implications associated with using DSPIs extensively; costs depend largely on usage patterns like calling functions multiple times asynchronously.

The speaker acknowledges that while DSPIs may incur high costs due to frequent calls, effective management strategies can mitigate these expenses.

Understanding Context Management in Programming Paradigms

The Cost of Contextual Information

The addition of contextual information to prompts is not inherently expensive; while the signature may be a simple string, the overall text sent to the model can be significantly longer.

Context management is more about programming paradigms than cost; developers can create compressed adapters to minimize data sent to models.

Strategies for Managing Large Contexts

If concerns arise regarding large contexts, additional logic can be implemented within the program or through an adapter to manage this effectively.

Techniques such as context compression could be explored, and there are ongoing discussions about improving context management in future developments.

Future of Context Management

There is speculation that either context windows will expand or that context management will become more abstracted over time, although no definitive solutions are currently available.