Getting Started With Hugging Face in 15 Minutes | Transformers, Pipeline, Tokenizer, Models
Getting Started with Hugging Face and Transformers Library
In this section, the speaker introduces the Hugging Face Transformers library, which is a popular NLP library in Python. The speaker explains how to install it and use it for various NLP tasks.
Installing the Transformers Library
- To install the Transformers library, first install your favorite deep learning library such as PyTorch or TensorFlow.
- Then, run
pip install transformersto install the Transformers library.
Using Pipelines for NLP Tasks
- Pipelines make it easy to apply an NLP task by abstracting away many details.
- To create a pipeline object, import
pipelinefromtransformers.
- Create a pipeline object by specifying a task such as sentiment analysis.
- Apply the pipeline object to input data and print the results.
Examples of Pipelines
- Text generation pipeline can be used to generate text using a specific model.
- Zero-shot classification can be used to classify text without knowing corresponding labels.
- Official documentation lists all available tasks including audio classification, automatic speech recognition, image classification, question answering, translation summarization.
Behind the Pipeline: Tokenizer and Model Class
- AutoTokenizer is a generic class that applies tokenization on input text.
- AutoModel is also a generic class but more specified for sequence classification.
Introduction to Hugging Face
In this section, the speaker introduces Hugging Face and explains how to use pre-trained models and tokenizers.
Using Pre-Trained Models and Tokenizers
- To use pre-trained models and tokenizers in Hugging Face, call the model class with
from_pretrainedmethod.
- The same can be done for tokenizers.
from_pretrainedis an important method in Hugging Face that is used frequently.
Understanding Tokenization
This section explains what a tokenizer does and how it works.
How Tokenization Works
- A tokenizer converts text into a mathematical representation that the model understands.
- The tokenizer can be called directly with input text or a list of texts.
- Different functions like
tokenize,convert_tokens_to_ids, anddecode_idscan be applied separately to get different outputs.
- Tokens have unique corresponding IDs, which can be decoded to get the original string back.
Combining Code with PyTorch or TensorFlow
This section shows how to combine code with PyTorch or TensorFlow using Hugging Face.
Using PyTorch or TensorFlow with Hugging Face
- The code for using PyTorch or TensorFlow is very similar to using it in Hugging Face.
- Multiple sentences can be fed into the pipeline classifier by putting them in a list.
- The tokenizer can also be applied directly instead of doing separate functions.
- Inference in PyTorch involves unpacking the batch dictionary and applying different functions like
softmaxandargmax.
Loading Pre-Trained Models and Tokenizers
In this section, the speaker explains how to load pre-trained models and tokenizers using the Hugging Face Transformers library.
Loading Pre-Trained Models and Tokenizers
- To load a pre-trained tokenizer, use
AutoTokenizer.from_pretrained().
- To load a pre-trained model, use
AutoModel.from_pretrained().
- These functions can be used together to get the same results as before.
- The Hugging Face Model Hub has almost 35,000 models available for use.
- Users can filter by pipeline tasks, libraries, data sets, languages or search for specific models.
- Users can find information about each model on its page in the Model Hub.
- To use a model from the Model Hub:
- Copy the name of the model from its page
- Use
pipeline()with the appropriate task (e.g. text classification)
- Paste in the name of the desired model as an argument for
model
Using Different Models from the Model Hub
In this section, users learn how to find and use different models from Hugging Face's Model Hub.
Finding and Using Different Models
- Users can filter by pipeline tasks or search for specific models in order to find what they need.
- Each model's page provides information about it including its name and any fine-tuning that has been done on it.
- Code examples are sometimes provided on these pages as well.
- To use a different model:
- Copy its name from its page
- Use
pipeline()with the appropriate task (e.g. summarization)
- Paste in the name of desired model as an argument for
model
Fine-Tuning Your Own Model
In this section, the speaker explains how to fine-tune your own model using Hugging Face's Transformers library.
Fine-Tuning Your Own Model
- To fine-tune your own model:
- Prepare your own data set
- Load a pre-trained tokenizer and call it with the data set to get encodings
- Prepare a PyTorch data set with the encodings (if using PyTorch)
- Load a pre-trained model
- Use
Trainerclass from Transformers library andtrainer.train()function with appropriate arguments to train the model on your prepared data set.
- The official Hugging Face documentation provides excellent resources for fine-tuning models.
- Users can switch between PyTorch and TensorFlow code in the documentation.
- A link to the official documentation is provided in the video description.
Conclusion
In this section, the speaker concludes by summarizing what was covered in the tutorial and providing additional resources for users who want to learn more.
Conclusion
- Users can load pre-trained models and tokenizers using Hugging Face's Transformers library.
- The Hugging Face Model Hub has almost 35,000 models available for use.
- Users can find different models by filtering by pipeline tasks or searching for specific models.
- Users can fine-tune their own models using Hugging Face's Transformers library and following their official documentation.
- Additional resources are available in the video description.