Getting Started With Hugging Face in 15 Minutes | Transformers, Pipeline, Tokenizer, Models

Name: Getting Started With Hugging Face in 15 Minutes | Transformers, Pipeline, Tokenizer, Models
Uploaded: 2022-04-03T10:07:31.000Z
Duration: 28 min 49 s

Getting Started with Hugging Face and Transformers Library

In this section, the speaker introduces the Hugging Face Transformers library, which is a popular NLP library in Python. The speaker explains how to install it and use it for various NLP tasks.

Installing the Transformers Library

To install the Transformers library, first install your favorite deep learning library such as PyTorch or TensorFlow.

Then, run pip install transformers to install the Transformers library.

Using Pipelines for NLP Tasks

Pipelines make it easy to apply an NLP task by abstracting away many details.

To create a pipeline object, import pipeline from transformers.

Create a pipeline object by specifying a task such as sentiment analysis.

Apply the pipeline object to input data and print the results.

Examples of Pipelines

Text generation pipeline can be used to generate text using a specific model.

Zero-shot classification can be used to classify text without knowing corresponding labels.

Official documentation lists all available tasks including audio classification, automatic speech recognition, image classification, question answering, translation summarization.

Behind the Pipeline: Tokenizer and Model Class

AutoTokenizer is a generic class that applies tokenization on input text.

AutoModel is also a generic class but more specified for sequence classification.

Introduction to Hugging Face

In this section, the speaker introduces Hugging Face and explains how to use pre-trained models and tokenizers.

Using Pre-Trained Models and Tokenizers

To use pre-trained models and tokenizers in Hugging Face, call the model class with from_pretrained method.

The same can be done for tokenizers.

from_pretrained is an important method in Hugging Face that is used frequently.

Understanding Tokenization

This section explains what a tokenizer does and how it works.

How Tokenization Works

A tokenizer converts text into a mathematical representation that the model understands.

The tokenizer can be called directly with input text or a list of texts.

Different functions like tokenize, convert_tokens_to_ids, and decode_ids can be applied separately to get different outputs.

Tokens have unique corresponding IDs, which can be decoded to get the original string back.

Combining Code with PyTorch or TensorFlow

This section shows how to combine code with PyTorch or TensorFlow using Hugging Face.

Using PyTorch or TensorFlow with Hugging Face

The code for using PyTorch or TensorFlow is very similar to using it in Hugging Face.

Multiple sentences can be fed into the pipeline classifier by putting them in a list.

The tokenizer can also be applied directly instead of doing separate functions.

Inference in PyTorch involves unpacking the batch dictionary and applying different functions like softmax and argmax.

Loading Pre-Trained Models and Tokenizers

In this section, the speaker explains how to load pre-trained models and tokenizers using the Hugging Face Transformers library.

Loading Pre-Trained Models and Tokenizers

To load a pre-trained tokenizer, use AutoTokenizer.from_pretrained().

To load a pre-trained model, use AutoModel.from_pretrained().

These functions can be used together to get the same results as before.

The Hugging Face Model Hub has almost 35,000 models available for use.

Users can filter by pipeline tasks, libraries, data sets, languages or search for specific models.

Users can find information about each model on its page in the Model Hub.

To use a model from the Model Hub:

Copy the name of the model from its page

Use pipeline() with the appropriate task (e.g. text classification)

Paste in the name of the desired model as an argument for model

Using Different Models from the Model Hub

In this section, users learn how to find and use different models from Hugging Face's Model Hub.

Finding and Using Different Models

Users can filter by pipeline tasks or search for specific models in order to find what they need.

Each model's page provides information about it including its name and any fine-tuning that has been done on it.

Code examples are sometimes provided on these pages as well.

To use a different model:

Copy its name from its page

Use pipeline() with the appropriate task (e.g. summarization)

Paste in the name of desired model as an argument for model

Fine-Tuning Your Own Model

In this section, the speaker explains how to fine-tune your own model using Hugging Face's Transformers library.

Fine-Tuning Your Own Model

To fine-tune your own model:

Prepare your own data set

Load a pre-trained tokenizer and call it with the data set to get encodings

Prepare a PyTorch data set with the encodings (if using PyTorch)

Load a pre-trained model

Use Trainer class from Transformers library and trainer.train() function with appropriate arguments to train the model on your prepared data set.

The official Hugging Face documentation provides excellent resources for fine-tuning models.

Users can switch between PyTorch and TensorFlow code in the documentation.

A link to the official documentation is provided in the video description.

Conclusion

In this section, the speaker concludes by summarizing what was covered in the tutorial and providing additional resources for users who want to learn more.

Conclusion

Users can load pre-trained models and tokenizers using Hugging Face's Transformers library.

The Hugging Face Model Hub has almost 35,000 models available for use.

Users can find different models by filtering by pipeline tasks or searching for specific models.

Users can fine-tune their own models using Hugging Face's Transformers library and following their official documentation.

Additional resources are available in the video description.