Instructor with Jason Liu - Weaviate Podcast #88!

Instructor with Jason Liu - Weaviate Podcast #88!

Introduction to the Weeva Podcast

In this section, the host introduces the podcast and welcomes Jason Lou as a guest. They discuss Jason's presentation at the AI Engineer Conference and his work on Pantic.

Overview of Instructor and Structured Output Parsing

    • Jason discusses his project, Instructor, which enables structured output parsing with language models (LLMs).
    • The host mentions Jason's impactful blog posts and lessons on engineering and highlights his experience as an independent consultant.
    • The host expresses excitement about learning from Jason during the conversation.

Creating Structured Data with Language Models

    • The host asks Jason for an overview of Instructor and what led him to create it.
    • Jason explains that when he used language models in the past, he often needed structured data as output. He shares an example from his time at Stitch Fix where they converted request notes into search queries using JSON.
    • Function calling allowed them to specify structured data, making programming with language models more compatible with existing code.
    • By using functions, structs, and methods, programming with language models becomes more straightforward and backward-compatible.

Making LLMs Backwards Compatible

    • The host mentions how making LLMs backwards compatible is a powerful concept. Passing data between different components is crucial for effective use of language models.
    • The host refers to Jason's example of reranking and explains how Instructor allows for multiple actions from the language model, such as providing explanations along with reranking.

Validation and Error Handling in Instructor

    • The host asks Jason about the validation and error handling mechanisms in Instructor.
    • Jason explains that Instructor is implemented in JavaScript and Python, leveraging well-supported libraries like Zod and Pantic for validation.
    • When using function calling or tool use, a JSON schema is passed to specify type information. If errors occur during output generation, Instructor captures them, formats them, and presents them back to the language model.
    • By sending a new message with error details instead of retrying the same request, better outputs can be obtained from the language model.

Fine-tuning Models for Specific Tasks

    • The host discusses the idea of fine-tuning models specific to tasks like taking an original attempt with an error message and generating a correct output.

The Secret Sauce of IDE Outputs

In this section, the speaker discusses the importance of making the IDE happy with the outputs generated by language models. This ensures that the desired objects are obtained and improves fault tolerance.

  • The real secret sauce is making the IDE happy with outputs.
  • IDE integration helps ensure that the correct object is obtained.
  • Weaker models struggle to provide accurate results, even with rephrasing.
  • Using instructor to generate synthetic data for fine-tuning can improve model performance.

Using Instructor for Synthetic Data Generation

The speaker explores the concept of using instructor to generate synthetic data for fine-tuning language models. This approach can help in challenging tasks and reduce reliance on validators.

  • Fine-tuning with synthetic data can help achieve one-shot correct answers.
  • Validators may not be necessary in the long run if models are built specifically for gb5.
  • Integrating instructor tightly with programming workflows offers significant benefits.

Function Calling Models and their Adoption

The speaker discusses the current state of function calling models and their adoption in various platforms and startups.

  • Function calling is currently a means to an end when used with instructor and other tools like mixtRal implementations.
  • Startups like OpenAI, Scale, Mistol, and Nexus have incorporated function calling into their products.
  • Function calling may become more separated from other tools in terms of choosing the right tool for specific tasks.

Organizing Prompts for Cleaner Codebases

The speaker highlights how organizing prompts using instructor leads to cleaner codebases by separating examples, schema, and data.

  • Organizing prompts makes codebases cleaner by separating examples from schema and data.
  • Previously, incorrect prompts could lead to extraction errors without realizing it due to prompt design.
  • Specifying schema and data out of band creates a safer programming environment.

The Impact of Organized Prompts on Programming Environment

The speaker discusses the impact of organized prompts on programming environments, emphasizing the benefits of strict typing and avoiding parsing issues.

  • Organized prompts enable strict typing and optional arguments, leading to cleaner code.
  • Out-of-band specification reduces parsing issues and ensures a safer programming environment.
  • Unstructured as a company utilizes instructor prompts for organizing ETL tasks.

Community Sourcing and Adoption of Language Models

The speaker highlights the role of community sourcing in the success of language models like Lang chain.

  • Community sourcing played a significant role in the adoption and success of Lang chain.

Using LMS for Writing Blog Posts

The speaker expresses excitement about using LMS (Language Model Server) to write blog posts. They discuss the potential of LMS frameworks to quickly showcase and share code snippets, such as instructor schemas, which could go viral on platforms like Twitter due to their appealing syntax.

Potential for Sharing Chains and Syntax

  • The speaker wonders if LMS could lead to a platform similar to Hugging Face for LLM (Large Language Model) programs, where people can easily share the chains they are using.
  • They believe that within a company, there are already examples of sharing schema libraries or prompt libraries. These modular components can be shared and utilized in various applications.
  • The speaker mentions transcript summarization APIs as an example of a modular component that can be shared. They highlight the ability to specify actions items using a simple syntax like "action items: list action items."

Public-Facing Components and Specialized Models

The discussion revolves around the possibility of making modular components more public-facing. Additionally, the potential emergence of specialized models in conjunction with LLM is explored.

Sharing Modular Components Publicly

  • The speaker believes that many modular components developed within companies can be made more public-facing.
  • Examples include the best summary extractions or extracting pain points from sales transcripts.
  • By sharing these components publicly, others can benefit from pre-built solutions.

Specialized Models and Tools

  • The speaker suggests that one advantage of utilizing LLM frameworks is the ability to add specialized models in between chains.
  • This flexibility allows for more advanced functionalities beyond just object-oriented programming.

Pantic Schema for Multihop Question Answering

The concept of using Pantic schema for multihop question answering is introduced. The speaker explains how Pantic allows for defining tree-like data structures and recursive definitions, enabling the creation of complex query plans.

Tree-Like Data Structures with Pantic

  • Pantic makes it easy to define data structures that resemble trees.
  • Instead of having multiple steps in a language model producing output, Pantic enables the creation of a single data structure representing the entire query plan.
  • This approach simplifies the workflow and allows for visualization, review, and dispatching to execution engines.

Parallelizing Tasks and LLM Reranking

The speaker discusses the benefits of parallelizing tasks using structured outputs from LLM. They also mention LLM reranking as an area of interest.

Parallelization through Structured Outputs

  • By providing a complex task to an LLM, such as writing a blog post on a specific topic, one can generate different queries for various subtopics.
  • This approach allows for parallelization by assembling the required information simultaneously.

LLM Reranking and Instructor Goang

  • The speaker expresses interest in LLM reranking and its potential application in their work at We8.
  • They inquire about Instructor Goang's capabilities and how much heavy lifting is done by Pantic in terms of validating schema types.

Writing Custom Libraries with Instructor-like API

The speaker discusses the development process of creating custom libraries with an Instructor-like API. They emphasize that most of the work involves organizing typing criteria to ensure IDE support.

Developing Custom Libraries with Instructor-like API

  • The base library code is relatively small (around 200 lines) but focuses on organizing typing criteria for better IDE support.
  • Examples are given where similar APIs have been implemented in other languages like Elixir and Rust.
  • The speaker encourages others to think about how an Instructor-like pattern could work in their respective languages.

Conclusion

The Importance of ID Verification

In this section, the speaker discusses the importance of verifying IDs to avoid hallucinations in data. They explain that using three-character codes as IDs reduces the likelihood of hallucinations and suggests that these codes should be within a certain set.

  • Verifying IDs is crucial to prevent hallucinations in data.
  • Using three-character codes as IDs reduces the chances of hallucination.
  • It is recommended to ensure that these three-character codes are within a specific set for a more reliable system.

Verification and Unlocks

The speaker emphasizes how verification unlocks various possibilities and prevents potential issues. They mention that without verification, one would have to rely on specific instructions or code checks, which can lead to complications.

  • Verification plays a significant role in unlocking potential opportunities.
  • Without verification, relying solely on instructions or code checks can create complications.
  • Verification ensures accuracy and reliability in systems.

Retrieval Augmented Generation (RAG)

The speaker highlights the connection between retrieval augmented generation (RAG) and recommendation systems. They draw parallels between RAG and their previous work at Stitch Fix, where human requests were transformed into recommendations by stylists.

  • RAG is similar to recommendation systems used at Stitch Fix.
  • At Stitch Fix, human requests were transformed into recommendations by stylists.
  • RAG replaces language models with personal stylists and inventory for generating recommendations.

User Representation in RAG

The speaker discusses the importance of user representation in retrieval augmented generation (RAG). They explain how user preferences and characteristics need to be considered when generating recommendations.

  • User representation is crucial in RAG.
  • User preferences and characteristics should be taken into account for generating personalized recommendations.

RAG vs. Recommendation Systems

The speaker compares retrieval augmented generation (RAG) to traditional recommendation systems. They explain that while RAG uses personal stylists and inventory, traditional recommendation systems rely on language models.

  • RAG utilizes personal stylists and inventory instead of language models.
  • Traditional recommendation systems rely on language models.
  • Both approaches aim to generate personalized recommendations.

Applying Expertise to Complex Applications

The speaker discusses how their expertise in recommendation systems can be applied to building more complex applications. They mention the importance of bringing knowledge from previous experiences to help others develop advanced programs.

  • Expertise in recommendation systems can be applied to building complex applications.
  • Sharing knowledge and experience helps others develop advanced programs.

User Representation Challenges

The speaker addresses challenges related to user representation in recommendation systems. They introduce the concept of "refc" as a way to represent users based on their interactions, but acknowledge limitations in terms of diversity.

  • Challenges exist in representing users accurately in recommendation systems.
  • "Refc" is introduced as a method for user representation based on interactions.
  • Limitations regarding diversity arise with the use of "refc."

Multi Vector Approach

The speaker discusses the concept of multi-vector approach for user representation. They explain how clustering vectors can provide more diverse recommendations and mention Pinterest's approach using a multi-arm bandit problem.

  • Multi-vector approach enhances diversity in recommendations by clustering vectors.
  • Pinterest's approach involves solving a multi-arm bandit problem.

Representation with Refc

The speaker elaborates on the "refc" method for user representation. They explain how averaging vectors can be used to represent users based on their interactions and preferences.

  • "Refc" represents users by averaging vectors based on their interactions.
  • Averaging vectors helps capture user preferences and characteristics.

Matrix Factorization and Sequence Modeling

The speaker discusses the use of matrix factorization and sequence modeling in recommendation systems. They mention the importance of considering time-weighted interactions for accurate predictions.

  • Matrix factorization and sequence modeling are utilized in recommendation systems.
  • Time-weighted interactions play a crucial role in accurate predictions.

Fine-tuning Recommendations

The speaker expresses curiosity about fine-tuning recommendations using transformers. They discuss the potential of graph neural networks or matrix factorization to aggregate clusters into one vector.

  • Fine-tuning recommendations using transformers is an area of interest.
  • Aggregating clusters into one vector can be achieved through graph neural networks or matrix factorization.

Diversity Sampling and Multivectors

The speaker explains diversity sampling as a method to enhance recommendations. They emphasize that multivectors are likely to be more effective unless fine-tuning is specific to a particular use case.

  • Diversity sampling enhances recommendations by considering various factors.

Viewing Preferences of Jason

In this section, the speaker discusses how Jason viewed different types of shoes, including LeBron James shoes for 3 seconds, Kobe Bryant shoes for 2 seconds, and Lionel Messi shoes for 30 seconds. This sequence prompts the speaker to consider showing more soccer shoes.

Jason's Viewing Preferences

  • Jason viewed LeBron James shoes for 3 seconds.
  • He looked at Kobe Bryant shoes for 2 seconds.
  • Lionel Messi shoes caught his attention for a longer duration of 30 seconds.

Using Language Models (LMs) in Ranking

The speaker discusses the use of language models (LMs) as an easy way to enhance ranking. While they may not be used extensively in re-ranking settings, LMs can be valuable in generating training data to boost ranking models.

Benefits of LMs in Ranking

  • LMs can generate training data quickly and efficiently before product launches.
  • By using LMs to produce training data initially, faster and lower latency ranking models can be trained.
  • As more production data becomes available, the reliance on LM-generated data can gradually decrease or continue alongside real user data.

Generating Data with LMs

The speaker explores the idea of using LMs to simulate user reactions and preferences. They mention having LMs with persona roles and discuss the potential power of using these simulations to generate training data.

Simulating User Reactions with LMs

  • LMs with persona roles can simulate how users might react to recommendations.
  • Similar concepts have been explored by Google with their "Rex Sim" idea.
  • Using LMs to generate data presents a powerful opportunity for training models.

Lightweight Ranker and Fine-tuning Models

The speaker discusses the concept of using a lightweight ranker until enough data is available, followed by fine-tuning embedding or ranking models. They highlight the importance of time signals and suggest that embeddings could be effective features in reranking models.

Lightweight Ranker and Fine-tuning Models

  • A lightweight ranker can be used until sufficient data is collected.
  • Once enough data is available, fine-tuning an embedding or ranking model becomes feasible.
  • Embeddings are likely to be one of the most effective features in such models.

Pre-computation and Latency Constraints

The speaker emphasizes the significance of pre-computation in reducing latency constraints. They mention using generative AI and deep learning to perform extensive work behind the scenes, allowing for faster inference times.

Pre-computation and Latency Constraints

  • Pre-computing with generative AI and deep learning can optimize latency constraints.
  • Multiple embedding models can be utilized for feature extraction.
  • Inference steps can involve computing dot products quickly, followed by logistic regression.
  • This approach proves beneficial in e-commerce settings where low latency is crucial.

Balancing Latency Constraints with Conversational Shopping Experiences

The speaker reflects on how conversational shopping experiences might tolerate higher latency compared to traditional e-commerce settings. They discuss the effectiveness of character AI in increasing time spent by users during conversations.

Balancing Latency Constraints with Conversational Shopping Experiences

  • Conversational shopping experiences may allow for more tolerance towards higher latency due to user engagement.
  • Character AI's ability to optimize time spent can be valuable in such settings.
  • The speaker wonders how this approach could be applied to the e-commerce world, where time to check out is crucial.

Importance of Time to Check Out in E-commerce

The speaker highlights the significance of time to check out in e-commerce. They mention Amazon's mean time to check out and discuss the implications of latency on user decisions.

Importance of Time to Check Out in E-commerce

  • Amazon's mean time to check out for a search request is approximately 35 seconds.
  • Users visiting e-commerce platforms often have a clear idea of what they want and prioritize quick purchasing decisions.
  • Conversational shopping experiences may require longer conversations, potentially leading to distractions and abandonment.
  • Character AI can optimize time spent by users but must consider the impact on business outcomes.

Streaming Outputs and Latency Solutions

The instructor discusses the benefits of streaming outputs for structured data, such as reranking IDS. By streaming the results back in real-time, latency can be reduced. This approach allows for partial results to be shown while the computation is still ongoing.

  • Streaming outputs with structured data can help solve latency issues.
  • Reranking IDS and showing partial results can improve user experience.
  • Streaming is particularly useful when dealing with large amounts of data.

ETL for LLMS and Knowledge Graph Completion

The conversation shifts towards discussing ETL (Extract, Transform, Load) processes for Language Models (LLMs) and companies like Instru Structured. The potential return of Knowledge Graphs and their extraction from text chunks is also explored.

  • ETL for LLMS companies make sense due to the abundance of unstructured data.
  • Specific verticals, such as insurance or processing PDFs for solar panel installations, offer opportunities for successful ETL businesses.
  • Generic extraction tasks may be better suited for tools like Instru Structured and table extractors.
  • Extracting tuples generically may not provide significant value across industries.
  • Business value lies in the data itself rather than just the extraction process.

Challenges of Generic vs Specific Vertical ETL

The challenges between generic extraction tasks and specific vertical-focused ETL are discussed. The importance of tackling idiosyncrasies within specific verticals to provide value is highlighted.

  • Tackling specific verticals allows for more efficient and valuable ETL solutions.
  • Companies specializing in parsing specific documents or specifications can attract more value by being highly accurate.
  • Generic models may struggle to provide strong business value due to lack of specificity.

Using Graph Databases for Knowledge Graphs

The conversation touches on the use of graph databases for Knowledge Graphs and the potential opportunities for ETL in this area.

  • Vector databases may not be ideal for Knowledge Graphs, and graph databases are recommended instead.
  • ETL can play a role in extracting tuples or chunking data for both Knowledge Graphs and vector databases.

Modal Experiment: Embedding All of Wikipedia

The discussion shifts to a modal experiment involving embedding all of Wikipedia using Coher's batch embeddings API. The benefits of local models and horizontal scaling are highlighted.

  • Coher offers batch embeddings for all of Wikipedia, allowing efficient embedding at a low cost.
  • Local models are cost-effective, and horizontal scaling provides scalability.
  • Embedding all of Wikipedia quickly allows for various applications and analysis.

Highlighting Value in Infrastructure Companies

The importance of highlighting the value provided by infrastructure companies is discussed, along with the potential benefits of using their tools effectively.

  • Understanding how to utilize infrastructure tools effectively is crucial in explaining their value.
  • Coher has done well by offering comprehensive solutions like Coher Edings for all of Wikipedia.
  • Emphasizing the affordability and scalability aspects can help showcase the value proposition.

New Section

In this section, the speaker discusses the benefits of scaling out GPU compute using Modal and how it can enable faster experimentation and exploration in building useful models.

Scaling Out GPU Compute with Modal

  • By scaling out GPU compute using Modal, experiments that previously took 15 hours on a single GPU can now be completed in just 15 minutes.
  • The cost of running these experiments is also significantly reduced to only $15 due to the scalability of Modal.
  • Modal offers a serverless GPU compute solution that allows for easy scaling across multiple GPUs at an affordable cost.
  • The Modal API provides a simple way to define functions and apply them to text chunks for various batch tasks.

Modal and GPUs as Databases

In this section, the speaker explores the concept of using GPUs as databases and discusses their potential advantages in terms of data transfer speed and parallel processing capabilities.

GPUs as Databases

  • GPUs have high data transfer speeds, making them suitable for use as databases.
  • Massively parallel processing capabilities of GPUs make certain tasks like geospatial search or vector database operations more efficient.
  • Query planning and offloading queries to GPUs could potentially improve performance in certain scenarios, such as parallel searches or complex computations.

[Fine-tuning Embedding Models]

In this section, the speaker emphasizes the importance of fine-tuning embedding models for improving recommendations and discusses the current state of difficulty in fine-tuning.

Importance of Fine-tuning Embedding Models

  • Fine-tuning embedding models is crucial for improving recommendations and achieving better business outcomes.
  • As usage increases, it is essential for models to adapt and provide personalized recommendations with network effects.
  • Fine-tuning on public datasets may not always be relevant, as the focus should be on specific data sets that align with the desired business outcomes.

Conclusion

Importance of Fine-tuning on Company Data

The speaker emphasizes the importance of fine-tuning models on company-specific data to differentiate from competitors. Fine-tuning on a large volume of legal contract data, for example, can lead to more relevant results compared to using a generic model.

Fine-tuning for Differentiation

  • Fine-tuning on company data is crucial for differentiation.
  • Using an embedding model that is fine-tuned specifically for a company's unique dataset can provide better results than using a general-purpose model.
  • For example, if two companies have answered 20 million legal contract questions each, both may be using GPT-4. However, if one company has fine-tuned their model with their specific legal contract data, it will likely outperform the other company's generic model.

Benefits of Fine-Tuning Open Source Models

The speaker discusses the benefits of fine-tuning open source models with a smaller amount of examples. Even fine-tuning on 2,000 to 3,000 examples can outperform closed source models in specific tasks.

Outperforming Closed Source Models

  • Fine-tuning open source models with a smaller number of examples can yield impressive results.
  • Even with just 2,000 to 3,000 examples, the fine-tuned model can outperform closed source models in specific tasks.
  • This demonstrates the effectiveness and potential of fine-tuning even with limited training data.

Capturing Different Types of Relationships

The speaker highlights the importance of capturing different types of relationships between queries and answers. Semantic similarity alone may not be sufficient; considering question-answer relationships or other specific relationships can enhance performance.

Beyond Semantic Similarity

  • Currently, the focus is on semantic similarity between queries and answers.
  • However, there are other types of relationships that can be captured to improve performance.
  • For example, the DPR models from Facebook introduced a question-answer relationship where an answer is linked to a specific question, rather than just being semantically similar.
  • Exploring and fine-tuning models to capture specific relationships can be an emerging direction in NLP.

Context-Specific Similarity

The speaker discusses the challenge of determining similarity in context-specific scenarios. Preferences and negations can influence whether two statements should be considered similar or dissimilar based on the context.

Contextual Considerations

  • Determining similarity in context-specific scenarios requires considering various factors.
  • For example, if someone loves coffee and hates coffee, should these statements be considered similar or dissimilar?
  • The answer depends on the specific use case and desired outcome.
  • In some cases, maximizing distance (dissimilarity) may be preferred, while in others, similarity may be desired.
  • Building embedding models that align with the intended business outcome is crucial.

Business Outcomes and Embedding Models

The speaker emphasizes the importance of aligning embedding models with specific business outcomes. Different use cases require different approaches to maximize relevance or distance between embeddings.

Aligning with Business Outcomes

  • To achieve desired business outcomes, embedding models need to align with specific use cases.
  • For example, a dating app may require maximizing distance between bios for better matching.
  • On the other hand, extracting quotes from a first date conversation may require high similarity between responses.
  • Defining the business outcome helps determine how embedding models should be fine-tuned.

Experience with Sentence Transformers

The speaker shares their experience with using Sentence Transformers, particularly in the context of fine-tuning and training embedding models.

Experience with Sentence Transformers

  • The speaker primarily uses Sentence Transformers and PyTorch for fine-tuning embedding models.
  • They mention being less familiar with Hugging Face's sentence transformers, considering it a newer technology.
  • Training embedding models can be challenging, especially when dealing with contrastive loss and cross entropy on dot products.

Triplet Loss for Embedding Models

The speaker discusses the benefits of triplet loss for embedding models. Triplet loss helps maintain diversity by pushing similar examples closer together while pushing dissimilar examples apart.

Benefits of Triplet Loss

  • Triplet loss is favored by the speaker as it helps maintain diversity in embeddings.
  • By ensuring that similar examples are pushed closer together and dissimilar examples are pushed apart, triplet loss prevents everything from becoming too similar.
  • However, finding suitable negative examples for triplet loss can be challenging and highly dependent on the specific use case.

Hard Negatives and RAG Model

The speaker introduces the concept of hard negatives and explains how RAG (Retrieval-Augmented Generation) can help generate better training data by tracking cited and uncited information.

Generating Better Training Data

  • Finding appropriate negative examples for triplet loss is difficult but crucial.
  • RAG model offers a solution by tracking cited and uncited information during document retrieval.
  • Uncited documents serve as hard negatives, allowing for improved training data generation.
  • This approach enables fine-tuning embedding models without relying heavily on user data or manual intervention.

Validation Context in InstructGPT

The speaker explains how validation context works in InstructGPT and how it can be used to track relevant and irrelevant information.

Validation Context in InstructGPT

  • In InstructGPT, a validation context can be passed to every validator.
  • This context is borrowed from the Pantic validators and serves as a reference.
  • By including relevant chunk IDs and chunk text in the validation context, it becomes possible to track which data was cited by the language model.
  • Validators can ensure that all cited information exists in the provided chunk ID list, flagging any inconsistencies or errors.

Fuzzy Assertions with InstructGPT

The speaker discusses how fuzzy assertions can be made using InstructGPT's validation context, allowing for more nuanced evaluations of responses.

Fuzzy Assertions with InstructGPT

  • InstructGPT enables fuzzy assertions through its validation context feature.
  • By including specific information in the context, such as chunk IDs, it becomes possible to validate responses against expected references.
  • For example, ensuring that every integer representing a chunk ID used to generate an answer actually exists in the provided list.
  • This approach helps identify errors or inconsistencies in generated responses without heavy reliance on human intervention.

Using LL Invalidator and OpenAI Moderation for Text Validation

The speaker discusses two ways to achieve text validation in Instructor. The first method involves using the LL Invalidator, a function that takes in a set of rules and builds a validation function with an error message. The second method utilizes the OpenAI moderation tool by replacing the string type hint with "moderated string" to communicate with the language model.

  • The LL Invalidator is a function that generates a validation function based on a set of rules.
  • The generated validation function checks if the text is mean or not and provides an error message on how to fix it.
  • OpenAI moderation can be used by replacing the string type hint with "moderated string" to pass the text through the content moderation tool.
  • Type hints are used to communicate with the language model and describe when data is correct or incorrect.

Future of Instructor as an Independent Consultant

The speaker explains their perspective on the future of Instructor as an independent consultant. They compare it to building different modes of transportation, emphasizing that Instructor's goal is to provide a useful tool rather than becoming a full-fledged framework.

  • The analogy of building different modes of transportation (bike, motorcycle, car) represents incremental development and scoping in software projects.
  • Instructor aims to be like a good knife, providing specific functionality without trying to build an entire kitchen or restaurant.
  • Types are used in Instructor to influence IDE behavior and improve programming practices such as async usage and caching.
  • Instructor is designed as an anti-framework, focusing on making specific tasks easier without being a full-fledged application or library.

Instructor as a Lightweight and Invisible Tool

The speaker discusses the philosophy behind Instructor, highlighting its role as a lightweight and invisible tool that enhances the developer experience with OpenAI models. They compare it to other libraries like requests and emphasize the importance of clean code and ease of use.

  • Instructor is meant to be an invisible tool that improves the developer experience with OpenAI models.
  • The speaker mentions RAG (Retrieval-Augmented Generation) as an example of creating unique applications using powerful models.
  • Building a business around Instructor is not the main focus, as it is primarily used for consulting work and demonstrating AI system improvements.
  • The goal of Instructor is to teach "knife skills" in programming, focusing on writing better Python code rather than building complex frameworks.

The Power of Clean Code with Instructor

The speaker emphasizes the power of clean code achieved through using Instructor. They discuss how it simplifies programming with OpenAI's SDK, making code more readable and manageable.

  • Using Instructor significantly cleans up code by providing clear object-oriented structure instead of working with raw strings.
  • While the impact may not be immediately noticeable, removing Instructor from the workflow can make programming with OpenAI's SDK more challenging.
  • Cleaner code leads to improved readability and maintainability, enhancing the overall development experience.

Conclusion

Video description

Jason Liu is the creator of Instructor, one of the world's leading LLM frameworks, particularly focused on structured output parsing with LLMs, or as Jason puts it "making LLMs more backwards compatible". It is hard to understand the impact of Instructor, this is truly leading us to the next era of LLM programming. It was such an honor chatting with Jason, his experience currently as an independent consultant and previously engineering at StitchFix and Meta makes him truly one of the most unique guests we have featured on the Weaviate podcast! I hope you enjoy the podcast! Links: Pydantic is all you need: Jason Liu - https://www.youtube.com/watch?v=yj-wSRJwrrc Course trailer for Jason's new course with Weights & Biases: https://twitter.com/jxnlco/status/1756704806436040856 (amazing, one of the best I've ever seen) Instructor: https://github.com/jxnl/instructor Model embeddings: https://modal.com/blog/embedding-wikipedia Attention Heads podcast with Jason Liu: https://www.youtube.com/watch?v=5-5jf3_mvBg New Weaviate Blog Post related to what Jason describes on generating synthetic data for fine-tuning at 6:30: https://weaviate.io/blog/fine-tuning-coheres-reranker Chapters 0:00 Welcome Jason! 1:10 Overview of Instructor 3:30 How does Instructor work? 6:00 Making the IDE happy 7:15 Function Calling 9:30 Cleaning Prompting Code 13:45 Instructor for RAG 18:50 RAG Consulting 20:26 Jason’s thoughts on Ref2Vec 28:16 LLM Latency and Streaming 31:15 ETL for LLMs with Instructor 35:25 Modal Embeddings 41:18 Fine-Tuning Embedding models 50:20 Future Directions for Instructor