Learn to Spell: Prompt Engineering (LLM Bootcamp)

Learn to Spell: Prompt Engineering (LLM Bootcamp)

Introduction to Prompt Engineering

In this section, the speaker introduces the concept of prompt engineering and its importance in adjusting the text that goes into a language model to achieve desired behavior.

Prompt Engineering

  • Prompt engineering is the art of designing the text that is inputted into a language model to shape its behavior.
  • It involves adjusting and manipulating the text to get the desired output from the language model.
  • Prompt engineering focuses on writing prompts or changing the text that goes into a language model, rather than retrieval augmentation techniques covered earlier.

Using Language Models for Task Performance

This section discusses how prompt engineering replaces traditional approaches like training and fine-tuning when working with language models.

Language Models as Programmers

  • Prompt engineering allows us to program language models by providing specific instructions in English instead of using programming languages like Python or Rust.
  • Language models act as statistical models of text, predicting probabilities for next words based on trained data.
  • By adjusting weights and probabilities through training, a language model can assign higher probabilities to generated texts.

Prompts as Magic Spells

The speaker explains why prompts are referred to as "magic spells" and provides insights into prompt design.

Understanding Prompts as Magic Spells

  • Referring to prompts as magic spells is metaphorical; they are not literally magical but rather rely on linear algebra and statistical modeling.
  • Language models are statistical models of text, capable of generating complex patterns based on learned data.
  • Prompts serve as instructions or inputs that guide language models in generating desired outputs.

Auto-regressive Models and Probability Prediction

This section delves into the concept of auto-regressive models and how language models predict probabilities for next tokens.

Auto-regressive Models and Probability Prediction

  • Language models are auto-regressive, meaning they predict the probability of the next token based on previously generated text.
  • By analyzing a list of tokens from a given text, language models assign probabilities to potential next words.
  • The model's training allows it to assign higher probabilities to more likely next tokens, gradually improving its performance.

Intuition and Limitations of Statistical Modeling

This section explores the intuition behind statistical modeling in language models and highlights their limitations.

Intuition and Limitations of Statistical Modeling

  • Language models learn patterns from data, but understanding them purely as statistical models can lead to misconceptions.
  • Comparing language models to simpler statistical models like linear regression or Gaussian distributions may not capture their complexity.
  • Language models like Google's autocomplete are limited in their capabilities compared to more advanced language models.

Language Models' Ability to Learn Patterns

This section discusses how language models excel at learning patterns in text.

Learning Patterns with Language Models

  • Language models have learned extensively about text, making them highly proficient at recognizing patterns.
  • They can generate high-probability outputs for various texts drawn from the internet or written by users.
  • However, relying solely on statistical modeling may result in oversimplification and underestimation of language model capabilities.

Timestamps provided are approximate.

Understanding Statistical Models as Programs

In this section, the speaker explains how statistical models can be thought of as programs that manipulate random data. They discuss the concept of using programming languages to generate answers and explore probabilistic programs represented by graphical models.

Statistical Models as Programs

  • Simple statistical models are often represented by equations or manipulations in probability theory.
  • However, more complex statistical models, like hierarchical linear regressions, can be better understood as programs that operate on random data.
  • Programming languages can be used to generate answers based on questions and brainstorming processes.
  • Language models are probabilistic, allowing for sampling and drawing different possibilities each time.
  • Probabilistic programs can be represented using graphical models.

Prompts as Magic Spells

The speaker draws inspiration from Arthur C Clarke's laws of technology to explain the concept of prompts. They compare prompts to magic spells that achieve impossible effects but require following complex rules. Different types of prompts are discussed in relation to pre-trained models, instruction-tuned models, and agent simulation.

Prompts as Magic Spells

  • Prompts are like magic spells that allow us to achieve impossible effects within language models.
  • Spending too much time learning prompts can have a negative impact on mental health.
  • Three intuitions for using prompts come from the world of magic: alternate universes, wishes, and golems.
  • For pre-trained models like gpt3 or llama, a prompt serves as a portal to an alternate universe where desired documents exist.
  • Instruction-tuned models such as chat GPT or alpaca use prompts as wishes for specific instructions.
  • In agent simulation with language models, a prompt creates a golem-like entity.

Prompts as Portals to Alternate Universes

The speaker explains the concept of prompts as portals to alternate universes within language models. They discuss how different words in a vocabulary can represent specific documents and highlight the probabilistic nature of language models.

Prompts as Portals to Alternate Universes

  • Text input into a language model acts as a portal to an alternate universe where desired documents exist.
  • Each word in the vocabulary represents a specific document, and selecting words at each position picks out particular documents.
  • Language models assign weights or probabilities to all possible documents, creating a probabilistic model of text.
  • There are countless possible documents, making it challenging to find specific ones within the context of a transformer's limitations.

Understanding Probabilistic Models of Text

The speaker discusses probabilistic models of text and how they assign weights or probabilities to all possible documents. They highlight the vast number of potential documents and the desire to extract specific ones.

Understanding Probabilistic Models of Text

  • A language model serves as a probabilistic model of text by assigning weights or probabilities to all possible documents.
  • Having a probabilistic model means assigning a probability value to each document within the space of data.
  • There are numerous possible documents, potentially reaching hundreds of millions or even billions with advanced transformers like gpt4.
  • Extracting specific desired documents from this vast space is challenging but essential for various applications.

Timestamps have been associated with bullet points based on their corresponding positions in the transcript.

New Section

This section explains how language models generate text based on conditioning and re-weighting documents.

Language Model and Document Generation

  • Language models predict the words that come after a given prompt in a document.
  • By inputting a few words as a prompt, the language model predicts the rest of the document.
  • The probability assigned to each word represents its likelihood in the generated document.

New Section

This section discusses how the choice of prompt influences the generation of documents.

Influence of Prompt Similarity

  • The choice of prompt influences the probability distribution of generated documents.
  • If the prompt is similar to certain prefixes in existing documents, those documents become more probable.
  • Documents with dissimilar prefixes become less probable due to re-weighting.

New Section

This section explains how conditioning is applied to probabilistic models for generating text.

Conditioning and Re-weighting

  • Conditioning involves adjusting probabilities based on a given prompt.
  • In document generation, conditioning focuses on specific universes or possibilities within all possible documents.
  • Re-weighting makes certain universes more likely than others in terms of generating text.

New Section

This section highlights how prompts help narrow down possibilities during text generation.

Focusing Text Generation

  • As words are added to a document, text generation narrows down from many possibilities to a specific world or universe.
  • Initially, before any words are written, numerous topics are possible for the document.
  • However, as information is revealed (e.g., mentioning David Attenborough), future tokens become more predictable (e.g., related to nature).

New Section

This section emphasizes that while prompts shape text generation, they cannot access information from alternate universes.

Limitations of Prompting

  • Prompts help shape the generated document but cannot retrieve information from alternate universes.
  • Jumping to an alternate universe and using its information is not possible.
  • Using prompts is more like running a search engine on nearby universes for relevant information.

New Section

This section provides examples of how prompts can influence the content of generated documents.

Examples of Prompt Influence

  • Writing a prompt about David Attenborough suggests that subsequent tokens will likely be related to plants or nature, rather than unrelated topics like tacos or babies.
  • Prompts guide the language model towards specific themes or subjects based on initial context.

New Section

This section explains the limitations of using prompts to access information from alternate universes.

Inability to Access Alternate Universes

  • While prompting narrows down possibilities, it does not grant access to alternate universes with different knowledge or solutions.
  • Mentioning a cure for cancer in another universe does not provide a valid solution in reality.
  • Prompting is more akin to searching for existing documentation rather than accessing new knowledge.

New Section

This section highlights how prompts can be used to find combined ideas and concepts within existing documents.

Combining Ideas through Prompts

  • Prompts can help combine existing ideas that have not been explicitly connected before.
  • For example, imagining Shakespeare's Dungeons and Dragons campaign based on Hamlet combines two distinct concepts into one creative idea.

New Section

This section discusses the concept of making wishes come true through instruction-tuned models.

Making Wishes Come True

  • Instruction-tuned models allow users to ask for something and receive the desired output.
  • Similar to stories about genies or creatures granting wishes, these models can fulfill requests based on instructions.

New Section

This section explores the potential of instruction-tuned models to reduce biases and improve self-correction.

Addressing Biases in Models

  • Instruction-tuned models aim to address biases inherited from past data by allowing direct commands.
  • Researchers are exploring ways to make these models less biased and more capable of moral self-correction.

The transcript is already in English, so there is no need for translation.

New Section

The importance of avoiding biases and stereotypes when prompting language models.

Prompting Language Models

  • Pre-trained language models can sometimes exhibit biases in their responses.
  • To address this, it is important to ensure that prompts are unbiased and do not rely on stereotypes.
  • Reframing instructional prompts can significantly improve the model's performance in avoiding biases.
  • Examples of effective instructional prompts include using simple low-level patterns instead of complex descriptions and turning negation statements into assertions.

New Section

Being precise when prompting language models and reframing instructional prompts.

Reframing Instructional Prompts

  • When instructing language models, it is important to be precise and provide clear instructions.
  • Mishra et al. suggest several ways to make instructional prompts more effective:
  • Express tasks in terms of simple low-level patterns instead of complex descriptions.
  • Craft questions that require common sense reasoning using specific phrases like "what may happen" or "what may have caused."
  • Simplify descriptions by turning them into bulleted lists.
  • Avoid negation statements by switching them into assertions.

New Section

Understanding the rules and limitations of language models.

Learning the Rules

  • Language models operate based on certain rules, similar to a genie granting wishes.
  • It is crucial to learn these rules when prompting language models effectively.
  • Following guidelines from research papers like Mishra et al.'s reframing instructional prompts can help avoid failure modes and biases in model responses.

New Section

Crafting effective questions for language models.

Crafting Questions

  • When instructing language models to craft questions, it is important to focus on common sense reasoning.
  • Instead of using lengthy descriptions, use simple patterns and phrases like "what may happen" or "what may have caused."
  • The goal is to create questions that are easy for humans but challenging for AI models.

New Section

Tips for precise instruction and avoiding biases in language models.

Precise Instruction

  • To get good performance from language models, treat them as if they were newly hired contractors with limited context or domain expertise.
  • Be precise in your instructions, similar to how you would explain a task to a contractor.
  • Bulleted lists can help improve clarity and understanding.

New Section

Addressing negation statements and the challenges of training language models.

Negation Statements

  • Language models tend to struggle with negation statements like "don't do X."
  • It is recommended to rephrase instructions by stating the opposite action instead of using negation.
  • For example, instead of saying "don't be stereotyped," say "ensure your answer does not rely on stereotypes."

New Section

Treating language models as annotators and the importance of precision.

Treating Models as Annotators

  • Language models should be treated as annotators rather than all-knowing entities.
  • Providing clear instructions and being precise is crucial for obtaining good performance from these models.
  • Similar to working with a team of human annotators, precision is key when designing annotation tasks.

New Section

Creating artificial agents using language models.

Creating Artificial Agents

  • Language models can be used to create artificial agents, similar to the concept of a Golem in Jewish folklore.
  • These agents can follow instructions and perform tasks based on the given instructions.
  • Even early large language models like ppd3 have the capability to take on personas.

The transcript is already in English, so there is no need for translation.

The Role of Language Models in Translation

This section discusses how having a mental model of masterful translators can improve the performance of language models. It also mentions the use of language models to create generative agents that can simulate personas and entire video game worlds.

Language Models as Translators

  • Having a mental model of masterful translators helps guide the production process for translating phrases into English.
  • This approach has significantly improved the performance of smaller GPT3 models on translation tasks.

Generative Agents and Personas

  • Language models can be used to create generative agents that simulate personas based on descriptions.
  • The "Generative Agents" paper explores this concept, allowing users to describe features of a persona and instructing the model to follow that description.

Modeling Text vs. Modeling Processes

This section delves into the primary focus of language models, which is modeling text. It highlights the connection between human utterances and our environment, emphasizing that language models need to consider various processes that produce text found on the internet.

Modeling Text with Language Models

  • Language models primarily focus on modeling text, including both human and machine utterances found online.
  • To become proficient at modeling text, language models must internally simulate various processes involved in producing text.

Simulating Processes for Better Language Modeling

As language models improve at modeling text, they need to simulate processes more accurately. This section explains how language models must simulate processes like running a Python interpreter in order to predict the next word effectively.

Simulating Processes for Better Predictions

  • To excel at predicting the next word accurately, language models must internally simulate different processes.
  • For example, when reading Python program outputs, language models simulate running a Python interpreter in their "brain" before predicting the next word.

Language Models as Agent Models

This section explores the concept of language models as agent models and discusses the importance of communicative intentions and beliefs in human utterances.

Communicative Intentions and Beliefs

  • One criticism of large language models was their lack of communicative intentions.
  • Humans use language to express beliefs about the environment and desires for specific outcomes.
  • Language models need to simulate these processes by carefully choosing prompt components to predict the next token accurately.

Limitations of Universal Simulators

This section highlights limitations faced by universal simulators, such as language models, including the scope of simulation and fidelity.

Scope of Simulation

  • Language models are trained on text written by humans, limiting their ability to simulate processes beyond what humans have written about.
  • Fictional super intelligences or scenarios not present in training data cannot be accurately simulated by language models.

Fidelity of Simulation

  • Language models only learn to simulate processes well enough to solve their language modeling tasks.
  • Simulating complex processes that require deep personal context is challenging for current language models.

Simulacra Simulated by Language Models

This section provides examples of simulacra that can be effectively simulated by language models based on their level of complexity.

Simulacra Examples

  • Language models can effectively simulate human reactions on social media platforms like Twitter or Reddit.
  • However, simulating a human's deep personal context or thinking for extended periods becomes more challenging for language models.

Understanding the Limitations of Language Models

In this section, the speaker discusses the limitations of language models and their ability to simulate real-world processes accurately.

Language Models vs. Calculators

  • Language models, like calculators, can provide approximate outputs without actually understanding the underlying process.
  • However, they are not as reliable as actual calculators or human mental math.

Simulating Python Runtimes

  • Language models can guess the outcomes of simple programs but cannot perfectly simulate a Python interpreter.
  • Emulating live API calls or processes that require real-time data from the real world is challenging for language models.
  • It is recommended to replace weak simulators in language models with actual implementations whenever possible.

The Role of Humans in Simulation

  • Human thinking is currently the best simulator besides language models.
  • However, relying on humans comes with additional complexities and limitations.

Key Takeaways

  • Pre-trained language models primarily generate alternate universe documents.
  • Instruction models can answer queries but may have varying quality depending on the agent and model used.
  • Prompt engineering tricks are useful but lack depth compared to core language modeling concepts.

The Spicy Take: Prompt Engineering Tricks

This section explores prompt engineering tricks and highlights their practicality compared to core language modeling concepts.

The Misconception about Few-Shot Learning

  • Few-shot learning is not an effective approach for prompting language models.
  • While it was initially believed that generative language models could be useful for various tasks, this assumption has proven incorrect.

Challenges with Tokenization

  • Tokenization can cause issues when working with language models.
  • Tips and tricks for dealing with tokenization challenges will be discussed later in the lecture.

Language Models as Few-Shot Learners

  • The gpt3 paper presented an analogy between few-shot learning and the way language models learn during training.
  • However, the idea that language models can effectively learn new tasks on the fly has not held up well in practice.

The Role of Prompting

  • Language models excel when they are explicitly told what task to perform rather than learning it from examples.
  • Carefully crafting prompts can often yield excellent performance without requiring extensive context or examples.

Weird Things to Watch Out For and the Emerging Playbook

This section covers some misconceptions and emerging strategies in prompt engineering.

Avoiding Ineffective Approaches

  • Few-shot learning is not a reliable method for prompting language models.
  • Tokenization can cause challenges, but there are tips and tricks to overcome them.

The Usefulness of Language Models

  • Initially, it was unclear whether generative language models would be practically useful.
  • Language models were expected to mimic intelligence but their utility was uncertain.

The Primary Role of Prompting

  • Rather than expecting language models to learn new tasks on the fly, their primary role is to be explicitly instructed about the task at hand.
  • Crafting prompts that clearly define the desired task leads to better performance.

The Spicy Take: Prompt Engineering Tricks

  • Many prompt engineering papers focus more on benchmark marketing than providing substantial depth.
  • While these tricks are valuable tools, they do not offer significant mathematical depth compared to core language modeling concepts.

Challenges with Language Models

This section discusses some of the challenges faced by language models, particularly in terms of their ability to move away from pre-training and adapt to new tasks.

Negative Results in Model Adaptation

  • Language models struggle to move away from what they learned during pre-training.
  • For example, when performing sentiment analysis, if positive and negative labels are permuted, the model may still classify statements incorrectly.
  • The GPT3 model demonstrated this issue by ignoring the provided labels and continuing to classify positive statements as positive, even when labeled as negative.

Permuting Labels Task

  • Recent studies indicate that language models can handle permuted label tasks.
  • GPT3 showed increased amounts of flipped labels for different models and sizes.
  • However, even though some progress has been made, language models still do not perfectly adapt to permuted labels.

Tokenization Challenges

This section highlights the challenges related to tokenization in language models and how it affects their understanding of characters and words.

Tokenizing Characters vs. Tokens

  • Language models do not see individual characters but rather tokens.
  • Different versions of a word or phrase may be tokenized differently due to variations in character frequency.
  • String operations like splitting or reversing words are not well-handled by language models.

Tricks for Handling Tokenization Issues

  • Adding spaces between letters can change tokenization behavior.
  • Tokens with spaces before and after them are treated separately.
  • Peter Wellinger proposed adding spaces as a workaround for certain tokenization issues.

Character-Level Challenges

This section discusses the difficulties faced by language models when dealing with character-level manipulations.

Difficulty with Character-Level Manipulations

  • Language models struggle with tasks like reversing words or other character-level operations.
  • Despite their capabilities in creative writing and summarization, they are not proficient in character-level manipulations.

Traditional Programming Approach

  • If a task can be accomplished using traditional programming methods, it is recommended to use those instead of relying on language models.

Playbook for Using Language Models

This section provides key tricks and strategies for effectively utilizing language models.

Formatted Text Advantage

  • Language models excel at predicting formatted text.
  • Utilizing well-formatted prompts increases the model's ability to generate accurate responses.

Tricks for Effective Usage

  • Riley Goodside shared innovative examples highlighting the power of well-formatted text prompts.
  • Leveraging the strengths of language models can lead to successful outcomes in various applications.

Structured Text and Pseudo Code

The speaker discusses the use of structured text, such as pseudo code, to provide a more organized format for language models. This allows models to generate code snippets or perform specific tasks based on the given structure.

Using Structured Text

  • Language models can make use of structured text that is not as rigorously structured as JSON or YAML but still provides a clear format for generating code or performing tasks.
  • By using structured text like pseudo code, language models can better understand and generate desired outputs.
  • Triple backticks (```) are important in markdown as they indicate that something is going to be code or pseudo code. This helps the model recognize the context and generate appropriate responses.

Decomposition and Prompt Engineering

  • Decomposing tasks into smaller pieces can help language models understand and generate accurate outputs.
  • Breaking down a task into smaller subtasks can trigger the prompting of another language model or an external tool.
  • By automating the construction of decomposition in prompts, it becomes easier to train language models without manually writing out extensive examples.

Reasoning with Prompt Engineering

  • Reasoning with prompt engineering involves providing reasoning steps in prompts to guide language models' thought process.
  • By including reasoning steps in prompts, language models are encouraged to think step-by-step and generate intermediate thoughts before arriving at a final answer.
  • This technique works well for mathematical tasks involving multiple steps and helps elicit reasoning capabilities from the model.

Checking Work with Recursive Criticism

  • Recursive criticism involves asking the model to check its own work after generating an initial response.
  • This two-stage prompting approach allows for iterative improvement by generating examples, using prompt engineering techniques like "let's think step by step," and then evaluating the generated output.

Conclusion and Final Thoughts

The speaker concludes by discussing the importance of prompt engineering in training language models. They highlight the effectiveness of techniques like decomposition, reasoning, and recursive criticism in improving model performance.

Prompt Engineering for Model Improvement

  • Prompt engineering plays a crucial role in improving language models' performance.
  • Techniques like decomposition, reasoning, and recursive criticism help elicit desired behaviors from the model.
  • By carefully tuning prompts and providing structured guidance, language models can generate more accurate and reliable outputs.

The transcript provided does not contain any timestamps beyond 0:46:28 (2788 seconds).

New Section

In this section, the speaker discusses using models to fix their outputs and the power of ensembling multiple model results.

Using Models to Fix Outputs

  • The speaker mentions that using models to fix their outputs is a powerful technique in prompting.
  • This approach involves utilizing models to generate responses and then making adjustments or corrections as needed.

Ensembling Multiple Models

  • Ensembling refers to combining the results of multiple models.
  • By generating multiple outputs from different runs of a probabilistic program, it becomes possible to identify the right answer based on its higher probability compared to wrong answers.
  • Ensembling works well for problems where there are many ways to get wrong answers but only a few ways to reach the correct answer.
  • The process involves taking the outputs of multiple models (e.g., 50 different responses) and performing majority voting.
  • Increasing the number of generations or members in the ensemble generally improves performance.

New Section

In this section, the speaker discusses injecting randomness for greater heterogeneity and combining various techniques for improved performance.

Injecting Randomness for Heterogeneity

  • To increase heterogeneity among model responses, one approach is to inject randomness into prompts. This can be done through operations like lowercase/uppercase conversions, which slightly alter the model's behavior.
  • Injecting randomness helps keep correct answers consistent while significantly changing wrong answers.

Combining Techniques

  • Various techniques discussed so far can be combined for better performance.
  • Few-shot examples can be used along with step-by-step thinking and ensembled together.
  • A combination of few-shot Chain of Thought and let's think step by step matches average human performance on challenging benchmarks such as Big Bench Hard.

New Section

In this section, the speaker discusses the impact of different techniques on latency and compute costs.

Impact on Latency and Compute

  • Few-shot Chain of Thought increases latency as it requires more information for the model to process, resulting in longer generation times.
  • Zero-shot Chain of Thought has less impact on latency and compute as it adds fewer things to the context.
  • Decomposing into subproblems generally increases the length and can be done using demonstration examples. It also affects latency.
  • Ensembling has no impact on latency but increases compute costs, especially when running parallel requests with an API service.

New Section

In this section, the speaker mentions skipping an example with theory of mind and highlights prompt engineering as a collection of tricks.

Prompt Engineering

  • Prompt engineering is a playbook consisting of various tricks to improve performance.
  • There is no hardcore math explaining why these tricks work; it's more like a bag of tricks approach.
  • Fiddliness of prompts should be taken into account, especially when comparing different approaches.

New Section

In this section, the speaker concludes by emphasizing that prompt engineering is a collection of tricks without a definitive mathematical explanation.

Key Takeaways

  • Prompt engineering involves utilizing a set of tricks to enhance model performance.
  • There is no specific mathematical theory supporting these techniques; they are based on practical experience and experimentation.
  • It's important to be aware of the intricacies involved in working with prompts, particularly when making comparisons between different approaches.
Video description

New course announcement ✨ We're teaching an in-person LLM bootcamp in the SF Bay Area on November 14, 2023. Come join us if you want to see the most up-to-date materials building LLM-powered products and learn in a hands-on environment. https://www.scale.bythebay.io/llm-workshop Hope to see some of you there! --------------------------------------------------------------------------------------------- In this video, Charles gives high-level intuitions and a default playbook for prompting language models. We consider two very different sources of intuition: "language models are statistical models of text" and "prompts are magic spells". Then, we review prompting techniques, like decomposition, reasoning, and reflection. Download slides from the bootcamp website here: https://fullstackdeeplearning.com/llm-bootcamp/spring-2023/prompt-engineering/ Intro and outro music made with Riffusion: https://github.com/riffusion/riffusion Watch the rest of the LLM Bootcamp videos here: https://www.youtube.com/playlist?list=PL1T8fO7ArWleyIqOy37OVXsP4hFXymdOZ 00:00 Intro 02:15 Language models are statistical models of text 04:50 But "statistical model" gives bad intuition 08:35 Prompts are magic spells 09:57 Prompts are portals to alternate universes 16:42 A prompt can make a wish come true 23:03 A prompt can create a golem 27:10 Limitations of LLMs as simulators 32:17 Prompting techniques are mostly tricks 33:40 Few-shot learning isn't the right model for prompting 37:40 Character-level operations are hard 40:23 The prompting playbook: reasoning, reflection, & ensembling

Learn to Spell: Prompt Engineering (LLM Bootcamp) | YouTube Video Summary | Video Highlight