Talk to YOUR DATA without OpenAI APIs: LangChain

Talk to YOUR DATA without OpenAI APIs: LangChain

Information Retrieval in Text and PDF Files

In this video, the speaker demonstrates how to do information retrieval from text and PDF files without using OpenAI's embeddings. The speaker introduces different concepts and models for dealing with text documents and working with multiple PDF files.

Installing Required Packages

  • To work with open source models in Link Chain, we need to install Link Chain, Hugging Face Hub, and Sentence Transformers.
  • We also need a Hugging Face Hub API token to use models from Hugging Face. This can be obtained by creating a new token or using an existing one from your Hugging Face account settings.

Working with Text Documents

  • We start by importing the Request Library to read a text file from a URL.
  • The text file is then stored on Google Drive as "state of the unit.txt".
  • We use the text_lower function from Document Loader to load the text document.
  • The document is divided into smaller chunks of 1000 tokens using Character Text Splitter.
  • These chunks are stored in the documents object.

Computing Embeddings

  • We use Embedics from Hugging Face instead of OpenAI's embeddings to compute embeddings from our documents.
  • There are several other types of open source embeddings available in Link Chain that can be used depending on the application.
  • A vector store for information retrieval is needed. In this case, we use Faiss CPU.

Querying Documents

  • To query our documents, we calculate its embeddings and perform a similarity search between the query and document set embeddings.
  • For example, if our query is "What did the president say about the Supreme Court?", it will find documents that have text closest to this query.

Question Answering with Hugging Face Hub and Google M5 Excel

In this section, the speaker explains how to use the Hugging Face Hub and Google M5 Excel for question answering. They demonstrate how to create a chain and run a query or prompt.

Creating a Chain

  • To create a chain for question answering, the speaker uses the large language model from Hugging Face Hub and Google M5 Excel.
  • The next step is to create a chain by passing on an LLM that was created.
  • The chain will run a query or prompt, find it, do similarity search on documents based on embeddings provided, and then change the creating document together to get the response.

Running a Query or Prompt

  • To run a query or prompt, pass in your query using chain.front function.
  • The response will be based on the information provided in the query.

Using Multiple PDF Files for Question Answering

In this section, the speaker demonstrates how to use multiple PDF files for question answering. They explain how to install required packages and import necessary functions.

Loading PDF Files

  • Install required packages before loading PDF files.
  • Import two different functions: unstructured PDF loader and Vector store index created.
  • Create loaders to load multiple PDF files.

Creating Vector Store Index

  • Use Vector store index created function to accept embeddings (in this case from Hugging Face).
  • Divide documents into different chunks using chunk size of 1000.
  • Pass on loaders that were created earlier so that embeddings can be created from these documents.

Information Retrieval with Q&A Chain

  • For information retrieval with Q&A chain, use large language model from Hugging Face Hub (Flan T5 model from Google instead of OpenAI's DaVinci models).
  • Pass on the large language model to the chain.
  • The retriever is the index store that was created earlier using Vector store index created function.
  • Use chain.front function to run a query or prompt.

Getting Responses

  • The response will be based on the information provided in the query.
  • Play around with maximum length and other parameters to get better responses.
Video description

In this video, I will show you how to interact with your data using LangChain without the need for OpenAI apis, for absolutely free. We will be making use of huggingface hub embeddings for transforming our documents in to vector representation (embeddings). For Large Language Models, we will again using open-sourced models instead of Openai models (text-davinci-033, chatgpt etc.). All the step will be performed with FREE & open source tools with LangChain. ▬▬▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬ ☕ Buy me a Coffee: https://ko-fi.com/promptengineering |🔴 Support my work on Patreon: Patreon.com/PromptEngineering 🦾 Discord: https://discord.com/invite/t4eYQRUcXB ▶️️ Subscribe: https://www.youtube.com/@engineerprompt?sub_confirmation=1 📧 Business Contact: engineerprompt@gmail.com 💼Consulting: https://calendly.com/engineerprompt/consulting-call ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Links: Google Notebook: https://colab.research.google.com/drive/1NaEyuFWCkDtkufHIsWQhfgPLHybXeYA1?usp=sharing LangChain: https://python.langchain.com/en/latest/index.html ------------------------------------------------- ☕ Buy me a Coffee: https://ko-fi.com/promptengineering Join the Patreon: patreon.com/PromptEngineering ------------------------------------------------- All Interesting Videos: Everything LangChain: https://www.youtube.com/playlist?list=PLVEEucA9MYhOu89CX8H3MBZqayTbcCTMr Everything LLM: https://youtube.com/playlist?list=PLVEEucA9MYhNF5-zeb4Iw2Nl1OKTH-Txw Everything Midjourney: https://youtube.com/playlist?list=PLVEEucA9MYhMdrdHZtFeEebl20LPkaSmw AI Image Generation: https://youtube.com/playlist?list=PLVEEucA9MYhPVgYazU5hx6emMXtargd4z

Talk to YOUR DATA without OpenAI APIs: LangChain | YouTube Video Summary | Video Highlight