Getting started with deploying foundation models on Amazon SageMaker | Amazon Web Services

Getting started with deploying foundation models on Amazon SageMaker | Amazon Web Services

How to Deploy a Hugging Face Model on SageMaker Endpoints

Section Overview

This section provides a step-by-step guide on deploying a Hugging Face model onto Amazon SageMaker endpoints, including the necessary setup and execution of inference tasks.

Introduction to Deployment

  • Emily introduces the topic of deploying Hugging Face models on SageMaker endpoints, emphasizing that users can host their own models in addition to pre-packaged ones from SageMaker JumpStart.
  • The demonstration focuses on deploying a vision question answering (VQA) model, which will identify objects in an image provided by the user.

Setting Up the Environment

  • Users are instructed to upgrade pip and install the SageMaker Python SDK as initial steps for setting up their environment.
  • The process involves creating a HuggingFaceModel object using the model ID from Hugging Face, along with specifying versions for transformers, PyTorch, and Python.

Deploying the Model

  • After establishing a connection with the model, it is deployed to an endpoint using huggingface_model.deploy, specifically targeting an m5 machine (CPU-based).
  • Serialization and deserialization objects are also passed during deployment to facilitate data handling when running inference.

Running Inference

  • An image named "spheres.jpeg" is uploaded to an S3 bucket prior to invoking the endpoint.
  • The question posed to the model is "What is in this image?" which is sent through runtime.invoke_endpoint using application/json format.

Results of Inference

Video description

In this video, you will learn how to run inference on SageMaker. To deploy FMs to production, SageMaker offers 80+ instance types and flexible deployment modes such as real-time, asynchronous, serverless, and batch transform so you can choose the right deployment mode for their use case. SageMaker offers specialized hosting containers such as Large Model Inference (LMI), Text Generation Interface (TGI), PyTorch, and custom containers along with the ability to optimize the container for performance and cost. Learn more: https://go.aws/3Vt466O Subscribe: More AWS videos: https://go.aws/3m5yEMW More AWS events videos: https://go.aws/3ZHq4BK Do you have technical AWS questions? Ask the community of experts on AWS re:Post: https://go.aws/3lPaoPb ABOUT AWS Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers — including the fastest-growing startups, largest enterprises, and leading government agencies — are using AWS to lower costs, become more agile, and innovate faster. #AWS #AmazonWebServices #CloudComputing #GenerativeAI #Foundationmodel