AI Agent for Transcribing & Summarizing Video | Agentic AI Project | Euron

Name: AI Agent for Transcribing & Summarizing Video | Agentic AI Project | Euron
Uploaded: 2025-03-27T14:30:06.000Z
Duration: 2 h 12 min 38 s

AI-Powered Video Summary Tool Overview

Introduction to the AI Video Summary Tool

This tool allows users to upload videos and receive concise summaries quickly, saving time for those who need only the main points.

The process involves extracting audio from the video, converting speech to text using AI, and generating a clear summary.

It is particularly beneficial for students, teachers, content creators, and anyone looking to understand video content more efficiently.

Project Setup

Creating the Project Environment

The project begins with creating an empty folder named "uron AI agent" for organizing files related to transcribing and summarizing videos.

Users are advised to set up a virtual environment using either Anaconda or venv modules; both methods are suitable depending on user familiarity.

Installing Anaconda Navigator

For those unfamiliar with creating environments, downloading Anaconda Navigator is recommended as it simplifies the setup process. Users can search for it in Chrome and follow installation instructions.

After installation, users should access the Anaconda Prompt to create their working environment.

Environment Creation Steps

Activating VS Code

Once in the project folder via terminal commands (e.g., cd), users can open Visual Studio Code directly from that location by typing code .. This integrates coding tools into their workflow seamlessly.

Creating a New Environment

To create a new environment using Conda, users will execute a command specifying the environment name (e.g., "uron video summarizer") along with Python version 3.9: conda create -n uron_video_summarizer python=3.9. This step ensures all dependencies are managed within this isolated space.

Installing Required Packages

Initial Package Installation

After activating the newly created environment (conda activate uron_video_summarizer), essential packages must be installed:

FFmpeg: A crucial tool for handling audio/video operations; install it using conda install -c conda-forge ffmpeg. Ensure you are in your activated environment during this step.

Additional Dependencies

Next, install PyTorch specifically configured for CPU usage with:

conda install pytorch torchvision torchaudio cpuonly -c pytorch

This package is vital for machine learning tasks involved in processing video data.

Speech Recognition Engine Installation

Install OpenAI Whisper, which serves as a speech recognition engine necessary for transcribing audio into text:

pip install git+https://github.com/openai/whisper.git

This enables effective conversion of spoken language into written format needed for summarization tasks.

Text Summarization Tools

Finally, install Transformers from Hugging Face which provides models used for text summarization:

pip install transformers

This package will facilitate generating concise summaries from extracted text data after transcription is complete.

Alternative Audio/Video Processing Library

As an alternative to FFmpeg, MoviePy can also be installed:

pip install moviepy

This library assists in further manipulating audio and video streams within Python projects.

Installation and Project Structure Setup

Overview of Installation Completion

The installation process has been completed successfully, marking the transition to creating a project structure.

Currently, there is only one file present in the folder named notes.txt, which can be used for comments or notes.

Creating Essential Files

The first file created is main.py, which orchestrates the overall flow of the project and serves as the entry point when running it.

A second file, transcriber.py, is introduced to handle audio extraction and transcription from video files. This file will manage all related functionalities.

Another file called summarizer.py is created to manage text summarization tasks within the project.

A utility file named utils.py will contain helper functions that support various aspects of the project.

Finally, an app.py file is designated for frontend operations using Streamlit, integrating with other components of the project.

Starting with Transcriber Script

Initializing Transcriber

The focus shifts to developing the transcriber script by importing necessary modules: subprocess, whisper, and os. These are essential for executing shell commands, handling speech-to-text conversion, and interacting with the operating system respectively.

Defining Audio Extraction Method

A method called extract_audio is defined to take two parameters: a video path (string) and an audio path (string). It aims to extract audio from a specified video file into a temporary audio format (temp_audio.wav).

Handling Existing Audio Files

Before extracting new audio, existing files at the target location are checked; if they exist, they are removed to prevent conflicts during processing. This ensures that old data does not interfere with new extractions.

Command Definition for Audio Processing

An important command utilizing FFmpeg is constructed as a list:

The command specifies input video via ffmpeg -i <video_path>.

Quality settings are adjusted using flags like -q:a 0 for high-quality output.

The command includes mapping options (-map a) to select specific audio tracks from videos.

FFmpeg and Whisper Integration for Audio Processing

Overview of FFmpeg Command Execution

The discussion begins with the execution of a command using FFmpeg, highlighting its ability to span processes and connect input/output/error pipes while obtaining return codes.

Emphasis is placed on utilizing the subprocess module to execute shell commands, which allows for streamlined command execution in Python.

The command structure is outlined: specifying FFmpeg as the multimedia processing tool, defining the input video file, setting audio quality, selecting audio tracks, and determining output file locations.

Silent Execution and Error Handling

The process involves executing commands silently by configuring subprocess parameters such as standard output (STDOUT) and standard error (STDERR).

Use of subprocess.DEVNULL ensures that outputs are suppressed during execution to maintain a clean console interface.

Setting check=True raises exceptions if the FFmpeg command fails, allowing for better error management.

Functionality of Audio Extraction

The method returns the path of extracted audio after successful execution of the FFmpeg command.

A new function called transcribe_audio is introduced to handle transcription once audio extraction is complete.

Transcription Process Using Whisper Model

The transcription function will utilize an OpenAI Whisper model to convert audio into text format.

Parameters include passing the audio path and model size; different sizes like tiny, base, small, medium, and large are available with 'base' being selected here.

Generating Transcript from Audio

After loading the Whisper model, transcription occurs through a call to model.transcribe, extracting text from the provided audio path.

The final transcript is returned after processing; two main functions have been defined: one for extracting audio and another for transcribing it using Whisper.

Setting Up Summarization with Transformers

Introduction to Text Summarization

Following transcription completion, attention shifts towards summarizing transcribed content using a summarizer function.

Importing Required Libraries

How to Summarize Text Using Transformers

Defining the Summarization Function

A function named summarize_text is defined, which takes two parameters: text (a string) and model_name (also a string).

The speaker mentions creating an account on Hugging Face to access models for summarization.

The model being used is identified as "facebook/bart-large-cnn," which is specifically designed for summarization tasks.

Setting Up Model Parameters

The model will utilize a maximum length of 150 tokens and a minimum length of 30 tokens for the summary output.

The function accepts four parameters: original text, model name, max length, and min length. This allows flexibility in testing different models.

Initializing the Summarizer Pipeline

The pipeline for summarization is initialized using the specified model name and parameters.

Summary generation involves calling the summarizer with input text along with defined max and min lengths.

Generating Deterministic Summaries

The parameter do_sample is set to false to ensure deterministic outputs; this means that the same input will always yield the same summary.

After processing, the function returns a summarized string based on extracted summary text from the pipeline.

Overview of Audio Transcription Process

Summarization aids in condensing audio transcriptions generated by a transcriber that extracts audio from video files using a whisper model.

Following transcription, this text is passed into the summarization model for concise representation.

Implementing Chunking Logic in Utilities

Chunking and Summarization Techniques

Overview of Chunking Functionality

The discussion begins with the need to break down large text into smaller chunks for effective summarization.

A new function called check_summarization is introduced, which will handle the summarization of these smaller text segments.

The first method, chunk_text, takes three parameters: the input text, chunk size (default set to 2000 characters), and overlap size (default set to 200 characters).

Importance of Overlap in Chunking

Overlap ensures continuity between consecutive chunks, maintaining context across splits.

The overlap is defined as the number of characters shared between successive chunks to preserve narrative flow.

Logic Behind Text Chunking

An empty list named chunks is initialized to store resulting text segments.

A loop iterates through the entire text until all content is covered, calculating start and end indices for each chunk.

Summarization Methodology

Another method called chunked_summary is defined, which also takes four parameters including the original text and a summarization function.

This method calls a summarization function on each chunk created by chunk_text, returning a final summarized string after processing all chunks.

Integration with Main Application

After defining utility functions, attention shifts to integrating them within the main application file (main.py).

Video to Summary Method Overview

Defining the Video to Summary Method

The method being defined is for converting video content into a summary, referred to as "video to summary."

Key parameters include the video path (string), model size (string), and summarizer model name (specifically Facebook's BART large CNN).

The method will also accept a boolean parameter called use_chunking, which defaults to false, indicating whether chunk summarization is needed.

The return type of this method is a string that contains the extracted summary from the video content.

Step 1: Extracting Audio from Video

The first step involves calling an extract_audio method with two parameters: the video path and audio path.

An example audio path provided is TMP audio.wve, which indicates where the extracted audio will be stored in the current directory.

Step 2: Transcribing Audio

After extracting audio, transcription occurs using a method called transcribe_audio, which requires both the audio path and model size as inputs.

A variable named transcribe_transcript is created to store the result of this transcription process.

Step 3: Summarizing Transcripts

This step involves summarizing the transcribed text. It breaks down long transcripts into manageable chunks for easier processing.

A utility function called chunk_summarize is used, which processes each chunk individually before combining them into one final summary string.

Final Steps: Cleaning Up

Video Summarization Process Overview

Execution of the Script

The final summary is returned after executing the script, which involves fixing and running the execution code.

The outer function is clarified, indicating that it’s not from function crl X, and emphasizes the importance of executing the script through a main function.

The video file, model size, summarize model name, and user checking are passed to execute when triggered by the main process.

Sample Video for Data Extraction

An example video file is introduced for data extraction purposes; it will serve as a sample for testing.

A brief description of what a data scientist does is provided: solving problems using data and leveraging various fields like statistics and machine learning.

Running the Main Script

Instructions are given to run the Python script (main.py) to extract audio from the provided video file.

Confirmation that a temporary audio file has been created from the video input, leading to generating a summary.

Summary Output Verification

A two-line summary is generated from a 60-second video, highlighting key aspects of data science roles and skills utilized in problem-solving.

The successful execution indicates readiness to integrate with Streamlit for further application development.

Setting Up Streamlit Application

Installation of Streamlit in the same environment is recommended before proceeding with app development.

Code snippets are shared for creating an app.py script that imports necessary libraries including Streamlit and defines methods for summarizing videos.

User Interaction in Streamlit App

The app allows users to upload videos; once uploaded, it processes them in the background to generate summaries.

A simple button interface triggers summarization processes within the app.

Running Streamlit Application

Instructions on how to run the Streamlit app (streamlit run app.py) are provided along with steps for uploading files.