Generative AI Foundations on AWS | Part 3: Prompt engineering and fine-tuning

Name: Generative AI Foundations on AWS | Part 3: Prompt engineering and fine-tuning
Uploaded: 2023-07-25T00:06:11.000Z
Duration: 1 h 52 min 7 s

Introduction to Pre-Trained Foundation Models

In this section, Emily Weber introduces the topic of pre-trained Foundation models and outlines the learning objectives for the session.

Understanding Foundation Models

Foundation models are pre-trained models that can be used for various natural language processing tasks.

The session will cover how to pick a foundation model and how to use pre-trained models through prompt engineering and fine-tuning.

Activities Covered in the Session

Exploring zero-shot, single-shot, and few-shot prompting techniques.

Understanding instruction fine-tuning and its importance.

Exploring different ways of using instruction prompting techniques.

Hands-on walkthrough using SageMaker JumpStarts on foundation models.

Prompt Engineering and Fine-Tuning

This section focuses on prompt engineering, which involves constructing effective prompts for the model, and fine-tuning techniques.

Prompt Engineering

Prompt engineering is the practice of refining prompts until they produce optimal responses from the model.

Using prompt templates can help guide prompt construction for different downstream tasks.

Syntax hacking involves understanding the preferred syntax of the model by experimenting with different words or phrases.

Zero-Shot vs Single-Shot vs Few-Shot Learning

Zero-shot learning means sending only an instruction as a prompt without additional context.

Single-shot learning involves providing a specific example or context along with the instruction in the prompt.

Few-shot learning refers to providing multiple examples or contexts along with the instruction in the prompt.

Retrieval augmented generation (RAG) techniques will be covered in a separate video.

Understanding Prompts and Prompt Engineering

This section delves deeper into understanding prompts and explains what prompt engineering entails.

What is a Prompt?

A prompt is the input sent to the model, which can be a question, instruction, or any other form of text.

Examples of prompts include asking for a story, solving a riddle, or requesting an image description.

Prompt Engineering

Prompt engineering involves refining prompts to achieve optimal responses from the model.

The goal is to create a prompt that elicits a perfect response on the first try.

Using prompt templates and understanding the preferred syntax of the model can aid in prompt engineering.

Creating an Optimal Prompt Template

This section focuses on creating an optimal prompt template and providing a seamless user experience.

Prompt Templates

Prompt templates are files that contain examples of prompts for different downstream tasks.

They help guide prompt construction and ensure consistent results for specific tasks like summarization or classification.

Syntax Hacking

Syntax hacking involves experimenting with different words or phrases to understand the preferred syntax of the model.

It helps in constructing prompts that yield better responses from the model.

Application integration, human feedback, and pre-training will be covered in later sections.

These notes provide a comprehensive summary of the transcript while incorporating timestamps where available. The structure follows clear subheadings and bullet points to organize key points and insights.

Introduction to Few-Shot Learning and Prompt Tuning

In this section, the speaker introduces the concepts of few-shot learning and prompt tuning. Few-shot learning involves training a model with multiple prompts or examples to perform tasks such as translation or summarization. Prompt tuning is a complex technique that involves training task-specific vectors using backpropagation.

Few-Shot Learning

Few-shot learning allows models to translate to different languages, perform nuanced summarization, or transfer styles.

It involves providing multiple examples in the prompt, including the body and instruction.

The goal is to train vectors specific to the task.

Prompt Tuning

Prompt tuning is a technique used for training models.

It helps generate task-specific vectors by training them using backpropagation.

Instruction fine-tuning is an example of prompt tuning where instructions are fine-tuned for supervised learning.

Importance of Instruction-Tuned Models

Instruction-tuned models make working with language models easier.

Generic language models like GPT2 or GPT3 may not be instruction-tuned.

Before using a model, ensure it has been instruction-tuned for better performance.

Examples of Before and After Instruction Tuning

This section provides examples comparing the output before and after instruction tuning.

Example 1: Why was six afraid of seven?

Before Instruction Tuning:

The answer generated by GPT2 is nonsensical and does not address the question.

Example 2: What's the difference between a mimosa and a samosa?

Before Instruction Tuning:

The answer generated by GPT2 hallucinates a word and does not provide a sensible response.

After Using AI21 Jurassic 2 Jumbo Instruct:

The answers generated are closer to the correct responses, demonstrating the effectiveness of instruction tuning.

Importance of Instruction Fine-Tuning

This section emphasizes the importance of using instruction fine-tuned models.

Instruction fine-tuning ensures that models are capable of following instructions.

It unlocks tasks like translation, summarization, and extraction.

Models that have undergone instruction fine-tuning respond more quickly and accurately.

Start with a model that hasn't been instruction fine-tuned and perform your own instruction fine-tuning for better results.

Instruction Fine-Tuning Process

This section explains the process of instruction fine-tuning.

Instruction fine-tuning uses supervised learning to adapt model behavior.

Develop a dataset of prompts and corresponding answers.

Perform supervised fine-tuning on these instructions using labeled datasets.

Start with a base model that hasn't been instruction fine-tuned before performing your own instruction fine-tuning.

Zero Shot Prompting and Customer Experience

This section discusses the concept of zero shot prompting and its importance in providing a seamless customer experience. The goal is for the model to answer questions without any additional prompts or examples.

Zero shot prompting aims for a customer experience where users can easily ask a question and get an answer without any additional effort.

The ideal scenario is when the model can provide detailed answers based solely on the given prompt, without requiring any previous examples.

The customer experience should be smooth, allowing users to quickly obtain the information they need.

Zero Shot Prompting Example: Nachos Ingredients

This section provides an example of zero shot prompting using nachos ingredients as an illustration.

Zero shot prompting involves sending a prompt and receiving an answer without any prior examples.

An example prompt could be asking about the basic ingredients of nachos.

The model should provide a detailed response that includes information about tortilla chips, cheese, jalapeno peppers, and optional additions like guacamole, sour cream, salsa, and ground beef.

Nachos are typically served as appetizers before main entrees or during events like Super Bowl watch parties.

Stable Diffusion and Detailed Answers

This section introduces stable diffusion as another approach to obtaining detailed answers from prompts.

Stable diffusion also involves sending a prompt but receiving an image instead of text.

In this case, stable diffusion generates high-quality images based on the given prompt.

An example is provided with a cute panda bear image generated by stable diffusion with photorealistic details.

Single Shot Prompting vs Few Shot Prompting

This section compares single shot prompting and few shot prompting as alternative approaches when zero shot prompting fails.

Single shot prompting involves adding an example to the prompt when zero shot prompting does not provide the desired answer.

An example is given where a main course suggestion is requested, and an appetizer (spinach dip) is provided as an example in the prompt.

The model is expected to fill in the blank and suggest a main course based on the given context.

Few shot prompting goes a step further by including multiple examples in the prompt.

An example prompt includes various topics related to data analysis, machine learning, and Python packages.

The goal is for the model to learn from these examples and generate appropriate responses.

Prompt Engineering for Summarization

This section discusses how prompt engineering can be used for summarization tasks.

Prompt engineering involves pasting the entire document or text that needs to be summarized into the prompt.

An additional instruction at the bottom of the prompt guides the model to perform summarization.

An example is provided with paragraphs about pug dogs being pasted into the prompt, followed by an instruction to summarize.

The model then generates a summary of the article about pugs, including their physical features, history, and popularity.

Fact Checking Generated Content

This section emphasizes the importance of fact-checking content generated by models.

Verbal fact-checking can be done to ensure accuracy and reliability of information generated by models.

In this case, verbal fact-checking was performed on content about pugs generated by the model during summarization.

The transcript has been summarized using bullet points linked to timestamps.

Fact Checking and Entity Extraction

In this section, the speaker discusses the process of fact checking and entity extraction using NER (Named Entity Recognition) packages. They emphasize the importance of verifying information based on semantic meaning rather than relying solely on generated text.

Fact Checking Process

Extract entities using NER packages.

Perform a quick check to determine if the generated text contains factual inaccuracies.

Verify if the British royal family is mentioned in the prompt and ensure it is not fabricated.

Importance of Semantic Meaning

When fact checking or term checking, consider whether the information aligns with the semantic meaning of the prompt.

This approach helps mitigate model hallucinations and improves accuracy.

Example: Pug Dog Image

The speaker shares an adorable image of a pug dog with a hashtag #puglife.

Although Queen Victoria was not present in the image, they still find it amusing and relevant to share.

Prompt Engineering for Classification

This section focuses on using prompt engineering techniques to solve classification tasks. The speaker explains how to structure prompts by including relevant information from documents and posing specific questions to obtain accurate classification results.

Prompt Engineering for Classification

Gather all relevant information about an object, person, or feature.

Include this information in the prompt along with a specific question related to classification.

By inputting comprehensive data into the model, accurate classifications can be obtained.

Example: Describing a Bear

The speaker provides an example where they describe a bear's characteristics in detail within the prompt.

They then ask the model what type of bear it is, expecting it to correctly identify it as a koala bear or koalas.

Prompt Engineering for Translation

This section explores the use of prompt engineering to solve translation tasks. The speaker demonstrates how to structure prompts for translating text into different languages, using British English as an example.

Prompt Engineering for Translation

Follow the same prompt engineering flow as in previous sections.

Input the text to be translated into the prompt and specify the desired target language.

The model will generate a translation based on the provided instructions.

Example: Translating Text to British English

The speaker inputs a sentence about parking a car, taking out trash, and putting an umbrella in the trunk.

They instruct the model to translate this text into British English.

The resulting translation includes British terms such as "parked my car on the drive" and "took out the rubbish."

The speaker mentions that "Brawley" might not be a hallucination but rather a term related to World War II when umbrellas were distributed by the government.

Conclusion

The transcript covers three main topics: fact checking and entity extraction, prompt engineering for classification, and prompt engineering for translation. Each section provides insights into utilizing these techniques effectively.

New Section

In this section, the speaker discusses the concept of prompt engineering and its role in fine-tuning models.

Prompt Engineering and Fine-Tuning

Prompt engineering involves generating a different background or modality for a given prompt while maintaining the same underlying meaning.

Fine-tuning is a technique used to improve model performance after exhausting other methods like zero-shot learning and instruction fine-tuning.

Fine-tuning with a good foundation-based model is expected to enhance model quality in a meaningful way.

However, there may be a gap between how well prompt engineering improves performance and what customers desire.

Fine-tuning helps bridge this gap by creating a new model artifact that belongs to the user, allowing them to use it anywhere they want.

An example of fine-tuning technique is Dreambooth, which enables image-to-image transformation and prompts on top of fine-tuned models.

New Section

This section explores the process of fine-tuning models using new data and its benefits.

Benefits of Fine-Tuning

Fine-tuning allows users to incorporate new data into their models and achieve desired outcomes.

By uploading multiple pictures of an object or subject, such as a dog, users can fine-tune stable diffusion models on those images.

After fine-tuning, users can provide prompts related to different settings or scenarios, resulting in realistic outputs that maintain key features of the input data.

Fine-tuning is an effective way to make models perform well on specific tasks that users care about.

New Section

This section delves into various techniques for fine-tuning models efficiently.

Efficient Techniques for Fine-Tuning

Parameter-efficient fine-tuning, such as low-rank adaptation and prefix tuning, allows users to inject trainable weights into a large language model (LLM) while minimizing resource requirements.

Transfer learning is another approach that involves adding additional layers on top of an existing neural network and training only those added layers.

Classic fine-tuning involves adding a specific task-related head to a pre-trained LLM and fine-tuning the classification or generation capabilities.

Continued pre-training is a technique used when unsupervised data is available, allowing the model to learn the syntax, style, and characteristics of the dataset.

The transcript provided does not cover all sections of the video.

Fine-tuning Models for Downstream Use Cases

In this section, the speaker discusses the process of fine-tuning models for different downstream use cases and introduces parameter efficient fine-tuning techniques.

Parameter Efficient Fine-tuning

Parameter efficient fine-tuning allows for fine-tuning only a few parameters of a massive pre-trained model.

It saves computational resources and avoids the need for training the entire model from scratch.

The library used for parameter efficient fine-tuning is called "puffed" from Hugging Face.

Using Laura to Inject Trainable Weights

Laura, a part of Hugging Face, injects trainable weights into a pre-trained model for fine-tuning.

This approach eliminates the need to train the entire massive model and instead focuses on training specific weights.

Example Code with Transformers Library

Import necessary modules from the Transformers Library, including AutoModel and puffed.

Specify the desired model name and tokenizer.

Set up Lora configuration with task type and inference mode.

Load the pre-trained model using either a local path or downloading it from the Hugging Face Hub.

Pass the loaded model to puffed function along with its configuration to get trainable parameters.

Demo: Fine-Tuning a GPT Model with SEC Filings

In this section, the speaker demonstrates how to fine-tune a GPTJ 6 billion parameter model using SEC filings from Amazon as an example dataset.

Running the Notebook in SageMaker Studio

The notebook is available in SageMaker examples under "intro to Amazon Algos jumpstart."

The notebook provides clear before-and-after comparisons of model performance.

Choosing an Appropriate Instance Type

It is recommended to use larger instance types like M5 when running notebooks due to their better compute capabilities.

Fine-Tuning the Model with SEC Filings

The notebook fine-tunes a GPTJ 6 billion parameter model using SEC filings from Amazon.

The performance of the model is evaluated before and after fine-tuning, showing significant improvement in generating natural language similar to SEC filings.

Conclusion

The transcript covers the process of fine-tuning models for downstream use cases, including parameter efficient fine-tuning techniques. It also provides a demonstration of fine-tuning a GPT model using SEC filings as an example dataset.

Retrieving the Model URI

In this section, the speaker explains how to retrieve the model URI using SageMaker. The model URI is used to host the model.

Retrieving the Model URI

The speaker explains that by providing the name and ID of the model, SageMaker can retrieve the corresponding model image.

The retrieved model URI is used as the image for hosting the model.

There are different scopes available, such as "inference" for getting the hosted version of the model or "training" for accessing the training version.

The SageMaker model is created with parameters like image, model data (stored in S3), AWS role, predictor class, and endpoint name.

Querying and Parsing Endpoint Response

This section focuses on querying an endpoint and parsing its response.

Querying and Parsing Endpoint Response

The speaker demonstrates how to query an endpoint using a JSON payload.

Parameters like max length and number of return sequences are set for querying.

The response from the endpoint is parsed to obtain results.

In this example, three prompts are provided to see what a generic GPTJ 6 billion parameter model returns.

The initial output may not make sense or be of high quality.

Improving Model Quality through Fine-tuning

This section discusses improving the quality of a pre-trained model through fine-tuning.

Improving Model Quality through Fine-tuning

Initially, the output from querying may not be of high quality.

However, by fine-tuning a pre-trained GPTJ 6 billion parameter model on a new SEC dataset, better quality outputs can be achieved.

Domain adaptation or continued pre-training is used, where the raw data is taken without explicit labeling.

Training artifacts, including the training image and parameters, are retrieved.

Hyperparameters for the training job and algorithm are set.

Automatic model tuning is used to optimize hyperparameters.

Exploring Default Hyperparameters

This section explores the default hyperparameters of the fine-tuned model.

Exploring Default Hyperparameters

The speaker retrieves the default hyperparameters for the fine-tuned model.

Parameters like epoch, learning rate, warm-up instruction, instruction tuning, and train from scratch are examined.

Automatic model tuning settings are updated.

Using Automatic Model Tuning

This section explains how to use automatic model tuning for optimizing hyperparameters.

Using Automatic Model Tuning

The speaker demonstrates using automatic model tuning for optimizing hyperparameters.

Maximum jobs and parallel jobs settings are specified.

The training process begins with all configurations set.

New Section

In this section, the speaker discusses jumpstart foundation models and domain adaptation.

Jumpstart Foundation Models and Domain Adaptation

The training script for jumpstart foundation models is stored in an S3 bucket.

The training instance type for the model is G5 with a few GPUs.

The hyperparameter tuner takes an hour to run even without using automatic model tuning.

While the job is running, enabling scrolling for outputs reduces the chances of notebook crashes.

The job can be viewed in the console, where analytics for the job can also be loaded.

After selecting the best model, it can be deployed onto another G5 instance using the same software framework.

A fine-tuned predictor is created and validated through endpoints in SageMaker.

The results of fine-tuning on a specific dataset are impressive.

New Section

In this section, the speaker explores running jobs and data locations.

Running Jobs and Data Locations

Running one jumpstart text generation model took about an hour on a G5.12 instance.

The data used for training is also stored in the jumpstart bucket.

Automatic model tuning or running without it are both viable options.

SageMaker provides training job analytics to analyze job performance.

New Section

In this section, the speaker demonstrates interacting with deployed models.

Interacting with Deployed Models

Endpoints can be cleared using JSON payload and responses can be parsed.

Parameters such as top K, top P, temperature, and prompts are used to send requests to the model.

The generated output from fine-tuned models closely resembles SEC reported data.

New Section

In this section, the speaker concludes the video and highlights the power of fine-tuning models.

Conclusion and Power of Fine-Tuning Models

Fine-tuning a small model like GPTJ on a specific dataset can yield powerful results.

The speaker hopes that viewers enjoyed the video and hints at what will be covered in the next one.