AI Agent for Transcribing & Summarizing Video | Agentic AI Project | Euron

AI Agent for Transcribing & Summarizing Video | Agentic AI Project | Euron

AI-Powered Video Summary Tool Overview

Introduction to the AI Video Summary Tool

  • This tool allows users to upload videos and receive concise summaries quickly, saving time for those who need only the main points.
  • The process involves extracting audio from the video, converting speech to text using AI, and generating a clear summary.
  • It is particularly beneficial for students, teachers, content creators, and anyone looking to understand video content more efficiently.

Project Setup

Creating the Project Environment

  • The project begins with creating an empty folder named "uron AI agent" for organizing files related to transcribing and summarizing videos.
  • Users are advised to set up a virtual environment using either Anaconda or venv modules; both methods are suitable depending on user familiarity.

Installing Anaconda Navigator

  • For those unfamiliar with creating environments, downloading Anaconda Navigator is recommended as it simplifies the setup process. Users can search for it in Chrome and follow installation instructions.
  • After installation, users should access the Anaconda Prompt to create their working environment.

Environment Creation Steps

Activating VS Code

  • Once in the project folder via terminal commands (e.g., cd), users can open Visual Studio Code directly from that location by typing code .. This integrates coding tools into their workflow seamlessly.

Creating a New Environment

  • To create a new environment using Conda, users will execute a command specifying the environment name (e.g., "uron video summarizer") along with Python version 3.9: conda create -n uron_video_summarizer python=3.9. This step ensures all dependencies are managed within this isolated space.

Installing Required Packages

Initial Package Installation

  • After activating the newly created environment (conda activate uron_video_summarizer), essential packages must be installed:
  • FFmpeg: A crucial tool for handling audio/video operations; install it using conda install -c conda-forge ffmpeg. Ensure you are in your activated environment during this step.

Additional Dependencies

  • Next, install PyTorch specifically configured for CPU usage with:

conda install pytorch torchvision torchaudio cpuonly -c pytorch

This package is vital for machine learning tasks involved in processing video data.

Speech Recognition Engine Installation

  • Install OpenAI Whisper, which serves as a speech recognition engine necessary for transcribing audio into text:

This enables effective conversion of spoken language into written format needed for summarization tasks.

Text Summarization Tools

  • Finally, install Transformers from Hugging Face which provides models used for text summarization:

pip install transformers

This package will facilitate generating concise summaries from extracted text data after transcription is complete.

Alternative Audio/Video Processing Library

  • As an alternative to FFmpeg, MoviePy can also be installed:

pip install moviepy

This library assists in further manipulating audio and video streams within Python projects.

Installation and Project Structure Setup

Overview of Installation Completion

  • The installation process has been completed successfully, marking the transition to creating a project structure.
  • Currently, there is only one file present in the folder named notes.txt, which can be used for comments or notes.

Creating Essential Files

  • The first file created is main.py, which orchestrates the overall flow of the project and serves as the entry point when running it.
  • A second file, transcriber.py, is introduced to handle audio extraction and transcription from video files. This file will manage all related functionalities.
  • Another file called summarizer.py is created to manage text summarization tasks within the project.
  • A utility file named utils.py will contain helper functions that support various aspects of the project.
  • Finally, an app.py file is designated for frontend operations using Streamlit, integrating with other components of the project.

Starting with Transcriber Script

Initializing Transcriber

  • The focus shifts to developing the transcriber script by importing necessary modules: subprocess, whisper, and os. These are essential for executing shell commands, handling speech-to-text conversion, and interacting with the operating system respectively.

Defining Audio Extraction Method

  • A method called extract_audio is defined to take two parameters: a video path (string) and an audio path (string). It aims to extract audio from a specified video file into a temporary audio format (temp_audio.wav).

Handling Existing Audio Files

  • Before extracting new audio, existing files at the target location are checked; if they exist, they are removed to prevent conflicts during processing. This ensures that old data does not interfere with new extractions.

Command Definition for Audio Processing

  • An important command utilizing FFmpeg is constructed as a list:
  • The command specifies input video via ffmpeg -i <video_path>.
  • Quality settings are adjusted using flags like -q:a 0 for high-quality output.
  • The command includes mapping options (-map a) to select specific audio tracks from videos.

FFmpeg and Whisper Integration for Audio Processing

Overview of FFmpeg Command Execution

  • The discussion begins with the execution of a command using FFmpeg, highlighting its ability to span processes and connect input/output/error pipes while obtaining return codes.
  • Emphasis is placed on utilizing the subprocess module to execute shell commands, which allows for streamlined command execution in Python.
  • The command structure is outlined: specifying FFmpeg as the multimedia processing tool, defining the input video file, setting audio quality, selecting audio tracks, and determining output file locations.

Silent Execution and Error Handling

  • The process involves executing commands silently by configuring subprocess parameters such as standard output (STDOUT) and standard error (STDERR).
  • Use of subprocess.DEVNULL ensures that outputs are suppressed during execution to maintain a clean console interface.
  • Setting check=True raises exceptions if the FFmpeg command fails, allowing for better error management.

Functionality of Audio Extraction

  • The method returns the path of extracted audio after successful execution of the FFmpeg command.
  • A new function called transcribe_audio is introduced to handle transcription once audio extraction is complete.

Transcription Process Using Whisper Model

  • The transcription function will utilize an OpenAI Whisper model to convert audio into text format.
  • Parameters include passing the audio path and model size; different sizes like tiny, base, small, medium, and large are available with 'base' being selected here.

Generating Transcript from Audio

  • After loading the Whisper model, transcription occurs through a call to model.transcribe, extracting text from the provided audio path.
  • The final transcript is returned after processing; two main functions have been defined: one for extracting audio and another for transcribing it using Whisper.

Setting Up Summarization with Transformers

Introduction to Text Summarization

  • Following transcription completion, attention shifts towards summarizing transcribed content using a summarizer function.

Importing Required Libraries

How to Summarize Text Using Transformers

Defining the Summarization Function

  • A function named summarize_text is defined, which takes two parameters: text (a string) and model_name (also a string).
  • The speaker mentions creating an account on Hugging Face to access models for summarization.
  • The model being used is identified as "facebook/bart-large-cnn," which is specifically designed for summarization tasks.

Setting Up Model Parameters

  • The model will utilize a maximum length of 150 tokens and a minimum length of 30 tokens for the summary output.
  • The function accepts four parameters: original text, model name, max length, and min length. This allows flexibility in testing different models.

Initializing the Summarizer Pipeline

  • The pipeline for summarization is initialized using the specified model name and parameters.
  • Summary generation involves calling the summarizer with input text along with defined max and min lengths.

Generating Deterministic Summaries

  • The parameter do_sample is set to false to ensure deterministic outputs; this means that the same input will always yield the same summary.
  • After processing, the function returns a summarized string based on extracted summary text from the pipeline.

Overview of Audio Transcription Process

  • Summarization aids in condensing audio transcriptions generated by a transcriber that extracts audio from video files using a whisper model.
  • Following transcription, this text is passed into the summarization model for concise representation.

Implementing Chunking Logic in Utilities

Chunking and Summarization Techniques

Overview of Chunking Functionality

  • The discussion begins with the need to break down large text into smaller chunks for effective summarization.
  • A new function called check_summarization is introduced, which will handle the summarization of these smaller text segments.
  • The first method, chunk_text, takes three parameters: the input text, chunk size (default set to 2000 characters), and overlap size (default set to 200 characters).

Importance of Overlap in Chunking

  • Overlap ensures continuity between consecutive chunks, maintaining context across splits.
  • The overlap is defined as the number of characters shared between successive chunks to preserve narrative flow.

Logic Behind Text Chunking

  • An empty list named chunks is initialized to store resulting text segments.
  • A loop iterates through the entire text until all content is covered, calculating start and end indices for each chunk.

Summarization Methodology

  • Another method called chunked_summary is defined, which also takes four parameters including the original text and a summarization function.
  • This method calls a summarization function on each chunk created by chunk_text, returning a final summarized string after processing all chunks.

Integration with Main Application

  • After defining utility functions, attention shifts to integrating them within the main application file (main.py).

Video to Summary Method Overview

Defining the Video to Summary Method

  • The method being defined is for converting video content into a summary, referred to as "video to summary."
  • Key parameters include the video path (string), model size (string), and summarizer model name (specifically Facebook's BART large CNN).
  • The method will also accept a boolean parameter called use_chunking, which defaults to false, indicating whether chunk summarization is needed.
  • The return type of this method is a string that contains the extracted summary from the video content.

Step 1: Extracting Audio from Video

  • The first step involves calling an extract_audio method with two parameters: the video path and audio path.
  • An example audio path provided is TMP audio.wve, which indicates where the extracted audio will be stored in the current directory.

Step 2: Transcribing Audio

  • After extracting audio, transcription occurs using a method called transcribe_audio, which requires both the audio path and model size as inputs.
  • A variable named transcribe_transcript is created to store the result of this transcription process.

Step 3: Summarizing Transcripts

  • This step involves summarizing the transcribed text. It breaks down long transcripts into manageable chunks for easier processing.
  • A utility function called chunk_summarize is used, which processes each chunk individually before combining them into one final summary string.

Final Steps: Cleaning Up

Video Summarization Process Overview

Execution of the Script

  • The final summary is returned after executing the script, which involves fixing and running the execution code.
  • The outer function is clarified, indicating that it’s not from function crl X, and emphasizes the importance of executing the script through a main function.
  • The video file, model size, summarize model name, and user checking are passed to execute when triggered by the main process.

Sample Video for Data Extraction

  • An example video file is introduced for data extraction purposes; it will serve as a sample for testing.
  • A brief description of what a data scientist does is provided: solving problems using data and leveraging various fields like statistics and machine learning.

Running the Main Script

  • Instructions are given to run the Python script (main.py) to extract audio from the provided video file.
  • Confirmation that a temporary audio file has been created from the video input, leading to generating a summary.

Summary Output Verification

  • A two-line summary is generated from a 60-second video, highlighting key aspects of data science roles and skills utilized in problem-solving.
  • The successful execution indicates readiness to integrate with Streamlit for further application development.

Setting Up Streamlit Application

  • Installation of Streamlit in the same environment is recommended before proceeding with app development.
  • Code snippets are shared for creating an app.py script that imports necessary libraries including Streamlit and defines methods for summarizing videos.

User Interaction in Streamlit App

  • The app allows users to upload videos; once uploaded, it processes them in the background to generate summaries.
  • A simple button interface triggers summarization processes within the app.

Running Streamlit Application

  • Instructions on how to run the Streamlit app (streamlit run app.py) are provided along with steps for uploading files.
Video description

Sign up with Euron today : https://euron.one/sign-up?ref=940C6863 Project Resource Link : https://euron.one/course/ai-agent-for-transcribing-and-summarizing-videos One Student One Subscription Euron Plus - https://euron.one/personal-plan/aa2904bd-b41c-407a-b912-9dd8c75d5637?ref=940C6863 Call or WhatsApp us at: +91 9019065931 / +91 9771695888. Learn to build an AI video summarizer in Python with this step-by-step video l! Perfect for beginners and seasoned coders, this comprehensive guide walks you through creating a smart tool that extracts key insights from videos in seconds. Master Python fundamentals, explore AI-powered transcription, and learn text summarization techniques using cutting-edge libraries like PyTorch, OpenAI Whisper, and Hugging Face Transformers. What you'll learn: - How to extract audio from videos and transcribe it using AI. - Implementing text summarization with simple Python scripts. - Setting up a structured Python project with practical examples. - Using Streamlit to create an interactive app for your summarizer. Whether you're a student, content creator, or just starting your coding journey, this project is the perfect way to gain hands-on experience in AI and Python. Kickstart your programming journey, enhance your skills, and create something impactful today! Hit play and code along! Don’t forget to like, subscribe, and hit the notification bell to stay updated with EuronTech’s latest tutorials. Let’s build something amazing! CHAPTERS: 00:00 - Overview 00:44 - Creating a Project Directory 01:24 - Creating a Conda Environment 04:54 - Installing Dependencies 11:35 - Creating Project Structure 13:41 - Transcriber 16:09 - Extracting Audio from Video 23:52 - Transcribing the Audio 27:18 - Summarizing the Text 28:58 - Summarize Text Function 39:20 - Chunking Logic 42:14 - Chunking 43:42 - Chunked Summarizer 45:38 - Main 59:50 - Executing the Code 1:02:33 - Installing Streamlit 1:03:26 - Creating the Streamlit App 1:05:19 - Running the Streamlit App Instagram: https://www.instagram.com/euron_official/?igsh=Z3A3cWgzdjEzaGl4&utm_source=qr WhatsApp :https://whatsapp.com/channel/0029VaeeJwq9RZAfPW9P2l07 LinkedIn: https://www.linkedin.com/company/euronone/?viewAsMember=true Facebook: https://www.facebook.com/people/EURON/61566117690191/ Twitter :https://x.com/i/flow/login?redirect_after_login=%2Feuron712