LangChain - Using Hugging Face Models locally (code walkthrough)

Name: LangChain - Using Hugging Face Models locally (code walkthrough)
Uploaded: 2023-03-08T13:00:07.000Z
Duration: 20 min 22 s

Using Models on Hugging Face Hub and Locally

In this video, the speaker discusses how to use models hosted on Hugging Face, both through the Hugging Face Hub and locally. They explain the two different approaches and highlight the advantages of using local models.

Using Models with Hugging Face Hub

The most common way to use models is through the Hugging Face Hub by pinging an API with your Hugging Face API token.

However, not all models are supported by the Hugging Face Hub, especially text-to-text generation and text generation models like Bart and T5.

The advantage of using the Hugging Face Hub is that it is convenient and easy to access a wide range of models.

Loading Models Locally

Loading models locally allows for fine-tuning and using GPU-hosted models without relying on the Hugging Face Hub.

Some models only work when loaded locally, making it necessary to use this approach in certain cases.

By loading models locally, you can have more control over model usage and avoid potential limitations of the Hugging Face Hub.

Example: Using Flan Model with Hugging Face Hub

Set up a language model chain using a prompt like "Let's think about it step by step."

Instantiate the model from the Hugging Face Hub repository (e.g., flan T5 XL).

Use the instantiated model to generate responses based on input questions or prompts.

Example: Limitations with Blender Model on Hugging Face Hub

The Blenderbot chat model may not be supported by the Hugging Face Hub for text generation tasks.

Attempting to use this model through the hub may result in errors or limitations due to its conversational AI nature.

Advantages of Local Model Usage

Fine-tuning and customization options are available when using models locally.

GPU-hosted models can be utilized, providing faster performance compared to non-GPU versions.

Some models only work when loaded locally, making it necessary to use this approach in certain cases.

Example: Using Flan Model Locally

Load a smaller version of the flan model locally to avoid overloading the GPU.

Utilize the Hugging Face pipeline to simplify tokenization and generate responses based on input parameters.

The transcript has been summarized and organized into sections for easier understanding.

Setting up the Pipeline

In this section, the speaker discusses how to set up the pipeline for different tasks using Hugging Face models.

Setting up the Pipeline

The pipeline needs to be configured based on the specific task, such as classification or named entity recognition.

Currently, not all tasks are supported in language models provided by Hugging Face.

To set up the pipeline, specify the model and tokenizer, and set a maximum length for input text.

Using Language Models Locally

This section explains how to use language models locally with Hugging Face.

Using Language Models Locally

Use local language models just like any large language model.

Query the model directly without any prompts for direct conditional generation.

Set up decoder models differently from other models.

Use auto tokenizer and auto model for causal language modeling with decoder models.

Working with Blenderbot Model Locally

This section focuses on working with Blenderbot model locally using Hugging Face.

Working with Blenderbot Model Locally

Bring in Blenderbot as an encoder-decoder model using auto tokenizer and auto model for sequence modeling.

Specify it as a text-to-text generation pipeline.

Despite being a comf AI model, it can be used for text-to-text generation locally.

The results obtained from this smaller-sized model are coherent and conversational.

Embedding Models Locally

This section covers embedding models locally using sentence Transformer package and Hugging Face embeddings.

Embedding Models Locally

Utilize sentence Transformer package and models to convert text into 768-dimensional vectors.

Set up the model with Hugging Face embeddings and pass the model for embedding text or documents.

Useful for tasks like semantic search or creating embeddings for downstream applications.

Conclusion

Using Hugging Face models locally provides flexibility in setting up pipelines, working with different types of models, and generating coherent responses. It allows for experimentation and exploration of various tasks such as classification, summarization, and conversational AI.