LangChain - Using Hugging Face Models locally (code walkthrough)
Using Models on Hugging Face Hub and Locally
In this video, the speaker discusses how to use models hosted on Hugging Face, both through the Hugging Face Hub and locally. They explain the two different approaches and highlight the advantages of using local models.
Using Models with Hugging Face Hub
- The most common way to use models is through the Hugging Face Hub by pinging an API with your Hugging Face API token.
- However, not all models are supported by the Hugging Face Hub, especially text-to-text generation and text generation models like Bart and T5.
- The advantage of using the Hugging Face Hub is that it is convenient and easy to access a wide range of models.
Loading Models Locally
- Loading models locally allows for fine-tuning and using GPU-hosted models without relying on the Hugging Face Hub.
- Some models only work when loaded locally, making it necessary to use this approach in certain cases.
- By loading models locally, you can have more control over model usage and avoid potential limitations of the Hugging Face Hub.
Example: Using Flan Model with Hugging Face Hub
- Set up a language model chain using a prompt like "Let's think about it step by step."
- Instantiate the model from the Hugging Face Hub repository (e.g., flan T5 XL).
- Use the instantiated model to generate responses based on input questions or prompts.
Example: Limitations with Blender Model on Hugging Face Hub
- The Blenderbot chat model may not be supported by the Hugging Face Hub for text generation tasks.
- Attempting to use this model through the hub may result in errors or limitations due to its conversational AI nature.
Advantages of Local Model Usage
- Fine-tuning and customization options are available when using models locally.
- GPU-hosted models can be utilized, providing faster performance compared to non-GPU versions.
- Some models only work when loaded locally, making it necessary to use this approach in certain cases.
Example: Using Flan Model Locally
- Load a smaller version of the flan model locally to avoid overloading the GPU.
- Utilize the Hugging Face pipeline to simplify tokenization and generate responses based on input parameters.
The transcript has been summarized and organized into sections for easier understanding.
Setting up the Pipeline
In this section, the speaker discusses how to set up the pipeline for different tasks using Hugging Face models.
Setting up the Pipeline
- The pipeline needs to be configured based on the specific task, such as classification or named entity recognition.
- Currently, not all tasks are supported in language models provided by Hugging Face.
- To set up the pipeline, specify the model and tokenizer, and set a maximum length for input text.
Using Language Models Locally
This section explains how to use language models locally with Hugging Face.
Using Language Models Locally
- Use local language models just like any large language model.
- Query the model directly without any prompts for direct conditional generation.
- Set up decoder models differently from other models.
- Use auto tokenizer and auto model for causal language modeling with decoder models.
Working with Blenderbot Model Locally
This section focuses on working with Blenderbot model locally using Hugging Face.
Working with Blenderbot Model Locally
- Bring in Blenderbot as an encoder-decoder model using auto tokenizer and auto model for sequence modeling.
- Specify it as a text-to-text generation pipeline.
- Despite being a comf AI model, it can be used for text-to-text generation locally.
- The results obtained from this smaller-sized model are coherent and conversational.
Embedding Models Locally
This section covers embedding models locally using sentence Transformer package and Hugging Face embeddings.
Embedding Models Locally
- Utilize sentence Transformer package and models to convert text into 768-dimensional vectors.
- Set up the model with Hugging Face embeddings and pass the model for embedding text or documents.
- Useful for tasks like semantic search or creating embeddings for downstream applications.
Conclusion
Using Hugging Face models locally provides flexibility in setting up pipelines, working with different types of models, and generating coherent responses. It allows for experimentation and exploration of various tasks such as classification, summarization, and conversational AI.