DSPy: The End of Prompt Engineering - Kevin Madura, AlixPartners
Introduction to DSPI
Overview of the Session
- The speaker welcomes attendees and introduces the topic of DSPI, encouraging questions throughout the session.
- Acknowledges that while there will be some slides, the focus will shift to coding in the latter half. A GitHub repository is available for participants to download code.
Understanding DSPI
- Defines DSPI as a declarative framework for building modular software, aimed at those who may not be full-time engineers.
- Shares personal experiences using DSPI for various projects, highlighting its efficiency in quickly iterating applications and programs.
Practical Applications of DSPI
Code Demonstration and Use Cases
- Mentions several use cases that will be demonstrated: sentiment classifier, PDF processing, multimodal work, web research agent, text summarization, and an optimizer with Jeepo.
Key Features of DSPI
- Emphasizes how DSPI allows users to decompose logic into programs treating LLMs (Large Language Models) as first-class citizens.
- Highlights structured outputs and guarantees about input/output types provided by DSPI's primitives.
Advantages of Using DSPI
Abstraction Level
- Discusses how DSPI operates at a high level of abstraction compared to other frameworks like Langchain; it focuses on user intent rather than low-level details.
Program Development Focus
- Stresses that users are building proper Python programs instead of merely tweaking strings or prompts; this leads to more robust software development.
System Design Philosophy
Systems Mindset
- Quotes Omar KB (founder/developer), explaining that DSPI is designed with a systems mindset which helps encode user intent effectively while adapting to evolving model capabilities.
Flexibility and Control Flow
- Notes that while retaining control flow within programs, users can switch between different models as needed without losing functionality.
Conclusion on Robustness and Alternatives
Convenience Without Compromise
- Affirms that convenience comes naturally with using DSPI due to its design; it minimizes unnecessary parsing tasks while maintaining clarity in program structure.
Comparison with Other Libraries
- Acknowledges other libraries such as Pedantic AI and Langchain but emphasizes unique aspects of DSPI’s approach.
Understanding DSPI: Key Concepts and Applications
Introduction to DSPI
- The speaker emphasizes the importance of an open mind when exploring DSPI, suggesting that experimentation with code is crucial for understanding its functionality.
- This talk focuses on practical applications of DSPI rather than exhaustive details, aiming to share personal experiences and solutions found through using DSPI.
Core Concepts of DSPI
- The core concepts of DSPI are summarized into five or six key elements, which will be elaborated upon throughout the discussion.
- Signatures define the desired function call by specifying inputs and outputs, allowing users to defer implementation details to the LLM (Large Language Model).
Modules and Tools in DSPI
- Modules serve as logical structures within a program, containing one or more signatures along with additional logic. They are based on PyTorch methodologies.
- In DSPI, tools are essentially Python functions that can be easily exposed to the LLM within its ecosystem.
Adapters and Their Role
- Adapters act as intermediaries between signatures and LLM calls, translating inputs/outputs into a format suitable for prompts sent to the LLM.
- There is ongoing research regarding optimal formats (e.g., XML, BAML, JSON), with adapters providing flexibility in choosing these formats.
Optimizers and Metrics
- Optimizers are a notable feature of DSP but should not overshadow other functionalities; they enhance program structuring alongside signatures and modules.
- Metrics work in conjunction with optimizers to measure success within a DSPI program by guiding optimization paths.
Signatures Explained
- Signatures express declarative intent through simple strings or complex class-based objects (like Pydantic), where field names can serve as mini-prompts for models.
- The naming of parameters is critical; intuitive names help models understand input requirements effectively.
Custom Prompts Integration
- Users can incorporate existing effective prompts into their workflow without losing their value; this can be done via docstrings or direct string injection during prompt construction.
- The ability to start from custom prompts allows users to build upon proven strategies while leveraging the structure provided by DSPI.
Understanding DSPI Modules and Their Functionality
Overview of DSPI Implementation
- The speaker discusses the initial confusion regarding the shorthand version of implementing logic in DSPI, emphasizing that it allows users to defer complex implementations to the model.
- A simple sentiment classifier can be created by providing text input and receiving an integer output for sentiment, with additional instructions clarifying the meaning of different sentiment values.
- This shorthand approach facilitates quick experimentation and iteration without needing to create detailed prompts from scratch.
Modular Structure in DSPI
- Modules serve as a foundational abstraction layer for DSPI programs, allowing users to build upon existing modules or create new ones based on effective techniques.
- The design encourages composability and optimization, enabling logical separation of program components while integrating LLM calls effectively.
Built-in Modules and Techniques
- The speaker mentions various built-in modules like
DSpi.predict, which is a straightforward language model call. Other prompting techniques are also discussed but may not be as relevant today.
- Examples include methodologies such as "chain of thought" prompting, which guides models through reasoning steps.
Tool Integration in DSP
- React serves as a tool-calling interface within DSP, allowing Python functions to be injected into the model seamlessly.
- The "Program with Thought" module enables models to reason in code, returning results while supporting custom Python interpreters for specific workflows.
Practical Application Example
- An example illustrates how a module can ensure time entries adhere to formatting standards using defined signatures and business logic interspersed with LLM calls.
- The implementation involves defining signatures at the top level and utilizing vanilla predict calls alongside hard-coded logic for practical use cases.
Web Tools and Adapters in LLMs
Overview of Web Tools
- The speaker discusses the use of web tools, emphasizing a controlled approach by limiting operations to five rounds to prevent erratic behavior.
- Introduction of adapters as prompt formatters that convert input signatures into specified message formats for better interaction with language models.
Understanding Adapters
- Example provided on how a JSON adapter transforms a Pydantic object into a structured prompt for the LLM, showcasing input fields like clinical note type and patient info.
- Clarification on the existence of a base adapter that serves general purposes while allowing customization for specific needs.
Performance Comparison
- Reference to testing conducted by an individual named Pashant comparing JSON and BAML adapters, highlighting improved intuitiveness and potential performance gains (5-10%).
- Emphasis on how switching from JSON to BAML can enhance information presentation without altering the underlying program structure.
Multimodality and File Handling
Multimodal Support in DSPI
- Discussion on DSPI's support for multiple modalities including images and audio, facilitating easy integration into workflows.
- Mention of an additional library called "attachments" designed to simplify file handling across various formats, enhancing usability with LLMs.
Practical Application Example
- An example is given involving a PDF document where users can simply provide a link for processing without needing intricate setup or understanding of backend processes.
- The speaker illustrates using RAG (Retrieval-Augmented Generation), asking questions based on documents fed into the system, demonstrating ease of use.
Optimizers in Model Performance
Introduction to Optimizers
- Optimizers are introduced as powerful tools that may outperform traditional fine-tuning methods under certain conditions, particularly in context learning scenarios.
Benefits of Using Optimizers
- Encouragement to experiment with optimizers before resorting to extensive infrastructure setups; they offer essential primitives for measuring and improving model performance quantitatively.
Transferability Through Optimization
- Explanation of how optimizers enable transferability between tasks by allowing adjustments in model configurations (e.g., using 41 nano instead of 41), potentially reducing operational costs significantly while maintaining acceptable performance levels.
Understanding DSPI and Its Optimization Techniques
Overview of DSPI Functionality
- The core function of the model involves iteratively optimizing prompts, enhancing performance through a structured program composed of various modules.
- DSPI autonomously optimizes components within a program to improve input-output performance, emphasizing its role as a facilitator rather than an optimizer itself.
Clarifying the Role of DSPI
- Omar clarifies that while DSPI is not an optimizer, it provides programming abstractions that allow for optimization as an added benefit.
- Insights from Dwaresh and Carpathy highlight potential pitfalls when using LLM as judges due to their ability to exploit weaknesses in models.
Adversarial Examples and Model Performance
- The discussion points out that LLM can identify adversarial examples which may lead to degraded performance if used improperly as evaluators.
- Optimizers in DSPI leverage these vulnerabilities to enhance model performance by identifying areas needing improvement.
Program Construction and Metrics
- A logical flow in constructing programs involves decomposing logic into modules and utilizing metrics to define operational contours.
- Recent discussions indicate that current optimizers are performing on par with or exceeding traditional fine-tuning methods like GRPO, showcasing advancements in prompt optimization techniques.
Defining Success Through Metrics
- Metrics serve as foundational elements for defining success criteria for optimizers, guiding them in assessing the impact of prompt adjustments on performance.
- Examples illustrate how metrics can range from rigorous equality checks to more subjective evaluations based on adherence to defined criteria.
Building Complex Workflows with DSPI
- The speaker emphasizes that DSPI equips users with essential primitives necessary for constructing complex workflows and data processing pipelines involving LLM integration.
Community Engagement and Practical Application
- Encouragement is given to engage with online communities for further learning about the latest developments related to DSPI and its applications.
- Transitioning into practical demonstrations, the speaker prepares to showcase code examples relevant to discussed concepts.
This structured overview captures key insights from the transcript while providing timestamps for easy reference.
Introduction to DSPI and Practical Applications
Overview of the Session
- The session begins with a focus on practical applications of DSPI, encouraging questions and interaction as they explore various Python programs.
- A configuration object is introduced for managing multiple language models (LMs), highlighting the need for different models based on workload complexity.
Utilizing Multiple Language Models
- The speaker discusses using Open Router API keys to access three different LMs: Claude, Gemini, and 41 mini.
- Each model's response is subjective; however, DSPI allows for defining answers more strictly by limiting options through literal definitions.
Caching Mechanism in DSPI
- The caching feature in DSPI enhances performance by loading previous results quickly if no changes are made to signature definitions.
- This caching capability is particularly useful for testing purposes, ensuring efficient retrieval of data.
Building a Simple Sentiment Classifier
Implementation Details
- A simple sentiment classifier is demonstrated where text input determines sentiment scores based on predefined criteria (lower scores indicate negative sentiment).
- Example inputs showcase how the classifier evaluates sentiments, illustrating its functionality despite seeming basic.
Signature Utilization
- The importance of shorthand signatures in DSPI is emphasized; it simplifies passing parameters like strings and integers into prediction objects.
Usage Tracking and Document Analysis
Built-in Usage Information
- DSPI provides detailed usage information per call, including token usage metrics that aid in observability and optimization efforts.
Document Handling Capabilities
- An example involving document analysis showcases how attachments can automatically extract relevant content from PDFs without manual intervention.
Advanced Document Analysis Techniques
Creative Data Structures
- The power of DSPI lies in its ability to handle complex data structures effortlessly due to its integration with Pydantic.
Deferring Structure Definition to Models
- By allowing the model to define key-value pairs from documents autonomously, users can streamline their analysis processes significantly.
Document Analysis and Schema in DSPI
Overview of Document Analyzer Schema
- The document analyzer schema is crucial for extracting important information such as filing dates, which defines the structure of the document analysis.
- The output from the document schema includes key elements like filing date, form date, and form type, formatted for structured outputs.
Accessing Results and Inspecting History
- Dot notation allows quick access to resulting objects from the document analysis process.
- The "inspect history" feature provides a raw dump of system messages and input/output fields used during processing.
Response Format and Adapter Usage
- Responses follow a specific format that includes various fields parsed by the DSPI (Document Structured Processing Interface).
- An example using a BAML adapter demonstrates how to define simple models with patient details integrated into clinical notes.
LLM Calls and Context Management
- Two calls are made: one using a smart LLM with built-in adapters, and another utilizing a defined BAML adapter.
- Python's context management allows switching between different LLM definitions for specific calls without affecting global settings.
Comparing Outputs from Different Adapters
- Both LLM calls yield identical outputs despite differing adapter usage; this highlights flexibility in model application.
- Inspecting history reveals differences in prompt structures between JSON schema and BAML notation, with BAML being more comprehensible.
Multimodal Examples in Document Analysis
Image Analysis Use Case
- A multimodal example involves analyzing street signs to determine parking availability based on time of day.
Reasoning Integration in Responses
- By default, responses include reasoning when using DSPI.chain; however, changing to predict mode would exclude reasoning from results.
Module Structure for Logic Implementation
- Modules encapsulate logic into replicable units; they can incorporate arbitrary business logic or control flows within their design.
Understanding Tool Calling in AI Modules
Core Logic Invocation
- The core logic of the AI module is invoked when instantiated, demonstrating a simple example where an analyzer (AIE123) is called, and a counter increments with each call.
Function Definitions and Module Creation
- Two functions are defined:
perplexity searchandget URL content, as part of creating a bioagent module that utilizes Gemini 25 as its LLM.
Async Functionality and Tool Calling
- An answer generator object is created using React calls, allowing for tool calling. An async version of the function is also implemented to enhance performance.
Data Processing and Classification
- The system loops through instances to determine if individuals have been at their companies for over ten years, utilizing tool calls for up-to-date information.
Debugging Insights
- Similar to reasoning objects in chain-of-thought models, debugging insights can be obtained from tool calls, including arguments passed and observations made during execution.
Exploring Dataset Creation and Metrics
Dataset Overview
- A dataset is created that categorizes various help messages (e.g., "my sync is broken") into classifications such as positive, neutral, or negative urgency levels.
Support Analyzer Module Development
- Multiple modules are packed into a single support analyzer module which defines metrics based on the dataset's characteristics to evaluate message urgency effectively.
Performance Evaluation Process
- The model's performance evaluation involves applying metrics iteratively to refine prompts based on feedback received from previous iterations.
Feedback Mechanisms in Model Training
Teacher Model Feedback Utilization
- In Jeepa training, feedback from a teacher model provides textual insights on classification errors, enhancing the learning process by explaining mistakes made by smaller models.
Iterative Optimization Loop
- This feedback mechanism tightens the iteration loop for prompt optimization by providing specific guidance on how to adjust responses based on prior inaccuracies.
Practical Applications of AI Processing
File Categorization Example
- Demonstrates processing various file types (contracts, images), showcasing how clients can manage large datasets with unknown contents efficiently through classification methods.
Dynamic Model Routing
- The implementation uses different models depending on file type detected; for instance, routing image requests specifically to Gemini models optimized for image recognition tasks.
Document Classification and Summarization Techniques
Overview of Document Processing
- The speaker discusses different approaches to handling various types of documents, such as SEC filings and contracts, emphasizing the need for tailored processing methods based on document type.
- A document classifier is utilized to predict the type of a file by analyzing images extracted from it, ensuring accurate categorization.
- The classification process involves determining if the document is an SEC filing, patent filing, contract, or related to city infrastructure among other categories.
Model Functionality and Use Cases
- The model's effectiveness in classifying diverse document types is highlighted; it can handle multiple categories without issues.
- For contracts specifically, a summarizer object is created that processes each page recursively to extract key information and boundaries within the document.
Summarization Process
- The summarization technique includes detecting boundaries within documents while generating summaries for better comprehension of content structure.
- An example illustrates reliance on the model's ability to classify city infrastructure documents based solely on visual cues present in images.
Practical Application Insights
- In production scenarios, more rigorous classification methods may be necessary, potentially involving multiple models for enhanced accuracy.
- The summarization logic iteratively works through chunks of a contract to create concise summaries while also identifying sections like exhibits or schedules.
Boundary Detection Mechanism
- A detailed explanation follows regarding how PDF pages are converted into images for classification purposes before being processed for boundary detection.
- The output indicates clear demarcation between main documents and supplementary materials (e.g., schedules), showcasing effective boundary detection techniques.
Conclusion on Implementation Challenges
- Discussion shifts towards challenges faced during implementation; asynchronous calls are made for classifying each image in a PDF effectively.
- Emphasis is placed on improving code quality over time with ongoing development efforts aimed at refining these processes.
Understanding the Use of Structured Outputs in Machine Learning
Overview of Signature and Output Generation
- The speaker discusses a signature that defines a tuple for input, which leads to generating a dictionary output with specific types (string, tuple, integer). This process is described as quick and efficient despite being non-production code.
- The initial implementation has shown promising results during testing. There are opportunities for optimization and improvement, but the basic functionality works well.
Iterative Development Process
- The model utilizes self-reflection by calling functions like
get page imagesto verify boundaries within the data. This iterative approach helps refine outputs based on real-time feedback.
- The discussion highlights how this method allows developers to leverage the model's introspective capabilities, creating a tight iteration loop between building and refining applications.
Enforcing Structure with ESP
- While structured outputs are beneficial, they require careful coordination when integrating into broader programs. The speaker emphasizes that ESP is not the only way to create applications but offers unique advantages once understood.
- Developers can quickly prototype applications using DSPI primitives, laying groundwork for scaling up to production-level systems.
Optimization Techniques in Machine Learning
- The speaker shares insights on optimization algorithms like "my row," emphasizing that having well-structured data is crucial for effective machine learning outcomes. A smaller dataset (10 to 100 examples) can still yield significant improvements if metrics are intuitive.
- An example of performance improvement from an entry corrector shows an increase from 86% to 89%. Metrics breakdown helps identify areas needing further refinement or adjustment in strategy.
Understanding Optimizer Outputs
- Optimizers produce serialized modules that can be saved and reused later. These modules manipulate prompt phrasing based on performance metrics observed during iterations.
- The optimizer dynamically adjusts prompts by identifying latent requirements not initially specified, enhancing overall model performance through learned adjustments based on data feedback.
Optimizing LLM Performance with DSPI
Utilizing AI to Enhance AI
- The discussion begins with the concept of using Large Language Models (LLMs) to improve their own performance by dynamically constructing new prompts, which are then iteratively refined.
- A question arises about why the solution object is not solely the optimized prompt, leading to clarification that while it can be accessed, there are additional components involved.
- The speaker mentions the importance of understanding the DSPI object and its various elements beyond just the optimized prompt.
Exploring DSPIHub
- Introduction of a tool called DSPIHub, designed as a repository for optimized programs where experts can share their optimizations for specific datasets or tasks.
- An example is provided showing how an optimized program can be loaded and utilized, highlighting its output as a result of the optimization process.
Practical Applications of Optimized Programs
- The speaker discusses practical use cases such as document classification, where an optimized classifier could identify specific types of documents from a large dataset.
- Emphasis on flexibility in application; once an optimization is achieved, it can be repurposed across different projects or teams.
Feedback Mechanisms in Optimization
- A question about real-time feedback mechanisms leads to discussions on continuous learning and how delayed user feedback could be integrated into optimization processes.
- It’s noted that delayed metrics would need to be added back into the dataset for future optimizations, creating a feedback loop.
Cost Considerations in Using DSPIs
- The conversation shifts towards cost implications associated with using DSPIs extensively; costs depend largely on usage patterns like calling functions multiple times asynchronously.
- The speaker acknowledges that while DSPIs may incur high costs due to frequent calls, effective management strategies can mitigate these expenses.
Understanding Context Management in Programming Paradigms
The Cost of Contextual Information
- The addition of contextual information to prompts is not inherently expensive; while the signature may be a simple string, the overall text sent to the model can be significantly longer.
- Context management is more about programming paradigms than cost; developers can create compressed adapters to minimize data sent to models.
Strategies for Managing Large Contexts
- If concerns arise regarding large contexts, additional logic can be implemented within the program or through an adapter to manage this effectively.
- Techniques such as context compression could be explored, and there are ongoing discussions about improving context management in future developments.
Future of Context Management
- There is speculation that either context windows will expand or that context management will become more abstracted over time, although no definitive solutions are currently available.