Getting started with pre-training foundation models on Amazon SageMaker | Amazon Web Services

Getting started with pre-training foundation models on Amazon SageMaker | Amazon Web Services

Getting Started with Pre-Training Foundation Models on SageMaker

Section Overview

This section introduces the process of pre-training a foundation model using Amazon SageMaker, specifically focusing on the Llama 2 70-billion parameter model.

Introduction to SageMaker Studio

  • Emily introduces the session and outlines the goal: pre-training a Llama 2 model using SageMaker.
  • Acknowledgment is given to Arun and other contributors for developing the resources used in this tutorial.
  • The environment being utilized is Jupyter Lab within SageMaker Studio, where necessary files are organized in a directory named "pre-train llama."

Setting Up the Environment

  • Initial steps include installing required packages locally within the notebook before downloading datasets.
  • The dataset used is the wiki corpus, which will be tokenized locally; sufficient computational power is emphasized (C5.18 XL instance).

Tokenization and Data Upload

  • After local tokenization, users are instructed to upload their processed data to an S3 bucket for further use.

Configuring Training Parameters

  • Key hyperparameters for training are highlighted, including world size and parallel processing configurations.
  • The base Docker image utilized is an AWS-managed deep learning container optimized for PyTorch training with NeuronX technology.

Executing Training Jobs

  • The PyTorch estimator script (run_llama_nxd) can be downloaded from GitHub; it facilitates running training jobs on Trainium instances.
Video description

To train your own FMs on SageMaker, you can easily access your training data and outputs model artifacts from an S3 bucket, choose from a wide choice of compute instances that are managed by SageMaker, and create a training job. In this video, we will show you how you can pre-train a Llama 2 model using SageMaker on AWS Trainium instances. Learn more: https://go.aws/3Vgd31M Subscribe: More AWS videos: https://go.aws/3m5yEMW More AWS events videos: https://go.aws/3ZHq4BK Do you have technical AWS questions? Ask the community of experts on AWS re:Post: https://go.aws/3lPaoPb ABOUT AWS Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers — including the fastest-growing startups, largest enterprises, and leading government agencies — are using AWS to lower costs, become more agile, and innovate faster. #AWS #AmazonWebServices #CloudComputing #GenerativeAI #Foundationmodel