AI Agent for Transcribing & Summarizing Video | Agentic AI Project | Euron
AI-Powered Video Summary Tool Overview
Introduction to the AI Video Summary Tool
- This tool allows users to upload videos and receive concise summaries quickly, saving time for those who need only the main points.
- The process involves extracting audio from the video, converting speech to text using AI, and generating a clear summary.
- It is particularly beneficial for students, teachers, content creators, and anyone looking to understand video content more efficiently.
Project Setup
Creating the Project Environment
- The project begins with creating an empty folder named "uron AI agent" for organizing files related to transcribing and summarizing videos.
- Users are advised to set up a virtual environment using either Anaconda or venv modules; both methods are suitable depending on user familiarity.
Installing Anaconda Navigator
- For those unfamiliar with creating environments, downloading Anaconda Navigator is recommended as it simplifies the setup process. Users can search for it in Chrome and follow installation instructions.
- After installation, users should access the Anaconda Prompt to create their working environment.
Environment Creation Steps
Activating VS Code
- Once in the project folder via terminal commands (e.g.,
cd), users can open Visual Studio Code directly from that location by typingcode .. This integrates coding tools into their workflow seamlessly.
Creating a New Environment
- To create a new environment using Conda, users will execute a command specifying the environment name (e.g., "uron video summarizer") along with Python version 3.9:
conda create -n uron_video_summarizer python=3.9. This step ensures all dependencies are managed within this isolated space.
Installing Required Packages
Initial Package Installation
- After activating the newly created environment (
conda activate uron_video_summarizer), essential packages must be installed:
- FFmpeg: A crucial tool for handling audio/video operations; install it using
conda install -c conda-forge ffmpeg. Ensure you are in your activated environment during this step.
Additional Dependencies
- Next, install PyTorch specifically configured for CPU usage with:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
This package is vital for machine learning tasks involved in processing video data.
Speech Recognition Engine Installation
- Install OpenAI Whisper, which serves as a speech recognition engine necessary for transcribing audio into text:
pip install git+https://github.com/openai/whisper.git
This enables effective conversion of spoken language into written format needed for summarization tasks.
Text Summarization Tools
- Finally, install Transformers from Hugging Face which provides models used for text summarization:
pip install transformers
This package will facilitate generating concise summaries from extracted text data after transcription is complete.
Alternative Audio/Video Processing Library
- As an alternative to FFmpeg, MoviePy can also be installed:
pip install moviepy
This library assists in further manipulating audio and video streams within Python projects.
Installation and Project Structure Setup
Overview of Installation Completion
- The installation process has been completed successfully, marking the transition to creating a project structure.
- Currently, there is only one file present in the folder named
notes.txt, which can be used for comments or notes.
Creating Essential Files
- The first file created is
main.py, which orchestrates the overall flow of the project and serves as the entry point when running it.
- A second file,
transcriber.py, is introduced to handle audio extraction and transcription from video files. This file will manage all related functionalities.
- Another file called
summarizer.pyis created to manage text summarization tasks within the project.
- A utility file named
utils.pywill contain helper functions that support various aspects of the project.
- Finally, an
app.pyfile is designated for frontend operations using Streamlit, integrating with other components of the project.
Starting with Transcriber Script
Initializing Transcriber
- The focus shifts to developing the transcriber script by importing necessary modules:
subprocess,whisper, andos. These are essential for executing shell commands, handling speech-to-text conversion, and interacting with the operating system respectively.
Defining Audio Extraction Method
- A method called
extract_audiois defined to take two parameters: a video path (string) and an audio path (string). It aims to extract audio from a specified video file into a temporary audio format (temp_audio.wav).
Handling Existing Audio Files
- Before extracting new audio, existing files at the target location are checked; if they exist, they are removed to prevent conflicts during processing. This ensures that old data does not interfere with new extractions.
Command Definition for Audio Processing
- An important command utilizing FFmpeg is constructed as a list:
- The command specifies input video via
ffmpeg -i <video_path>.
- Quality settings are adjusted using flags like
-q:a 0for high-quality output.
- The command includes mapping options (
-map a) to select specific audio tracks from videos.
FFmpeg and Whisper Integration for Audio Processing
Overview of FFmpeg Command Execution
- The discussion begins with the execution of a command using FFmpeg, highlighting its ability to span processes and connect input/output/error pipes while obtaining return codes.
- Emphasis is placed on utilizing the subprocess module to execute shell commands, which allows for streamlined command execution in Python.
- The command structure is outlined: specifying FFmpeg as the multimedia processing tool, defining the input video file, setting audio quality, selecting audio tracks, and determining output file locations.
Silent Execution and Error Handling
- The process involves executing commands silently by configuring subprocess parameters such as standard output (STDOUT) and standard error (STDERR).
- Use of
subprocess.DEVNULLensures that outputs are suppressed during execution to maintain a clean console interface.
- Setting
check=Trueraises exceptions if the FFmpeg command fails, allowing for better error management.
Functionality of Audio Extraction
- The method returns the path of extracted audio after successful execution of the FFmpeg command.
- A new function called
transcribe_audiois introduced to handle transcription once audio extraction is complete.
Transcription Process Using Whisper Model
- The transcription function will utilize an OpenAI Whisper model to convert audio into text format.
- Parameters include passing the audio path and model size; different sizes like tiny, base, small, medium, and large are available with 'base' being selected here.
Generating Transcript from Audio
- After loading the Whisper model, transcription occurs through a call to
model.transcribe, extracting text from the provided audio path.
- The final transcript is returned after processing; two main functions have been defined: one for extracting audio and another for transcribing it using Whisper.
Setting Up Summarization with Transformers
Introduction to Text Summarization
- Following transcription completion, attention shifts towards summarizing transcribed content using a summarizer function.
Importing Required Libraries
How to Summarize Text Using Transformers
Defining the Summarization Function
- A function named
summarize_textis defined, which takes two parameters:text(a string) andmodel_name(also a string).
- The speaker mentions creating an account on Hugging Face to access models for summarization.
- The model being used is identified as "facebook/bart-large-cnn," which is specifically designed for summarization tasks.
Setting Up Model Parameters
- The model will utilize a maximum length of 150 tokens and a minimum length of 30 tokens for the summary output.
- The function accepts four parameters: original text, model name, max length, and min length. This allows flexibility in testing different models.
Initializing the Summarizer Pipeline
- The pipeline for summarization is initialized using the specified model name and parameters.
- Summary generation involves calling the
summarizerwith input text along with defined max and min lengths.
Generating Deterministic Summaries
- The parameter
do_sampleis set to false to ensure deterministic outputs; this means that the same input will always yield the same summary.
- After processing, the function returns a summarized string based on extracted summary text from the pipeline.
Overview of Audio Transcription Process
- Summarization aids in condensing audio transcriptions generated by a transcriber that extracts audio from video files using a whisper model.
- Following transcription, this text is passed into the summarization model for concise representation.
Implementing Chunking Logic in Utilities
Chunking and Summarization Techniques
Overview of Chunking Functionality
- The discussion begins with the need to break down large text into smaller chunks for effective summarization.
- A new function called
check_summarizationis introduced, which will handle the summarization of these smaller text segments.
- The first method,
chunk_text, takes three parameters: the input text, chunk size (default set to 2000 characters), and overlap size (default set to 200 characters).
Importance of Overlap in Chunking
- Overlap ensures continuity between consecutive chunks, maintaining context across splits.
- The overlap is defined as the number of characters shared between successive chunks to preserve narrative flow.
Logic Behind Text Chunking
- An empty list named
chunksis initialized to store resulting text segments.
- A loop iterates through the entire text until all content is covered, calculating start and end indices for each chunk.
Summarization Methodology
- Another method called
chunked_summaryis defined, which also takes four parameters including the original text and a summarization function.
- This method calls a summarization function on each chunk created by
chunk_text, returning a final summarized string after processing all chunks.
Integration with Main Application
- After defining utility functions, attention shifts to integrating them within the main application file (
main.py).
Video to Summary Method Overview
Defining the Video to Summary Method
- The method being defined is for converting video content into a summary, referred to as "video to summary."
- Key parameters include the video path (string), model size (string), and summarizer model name (specifically Facebook's BART large CNN).
- The method will also accept a boolean parameter called
use_chunking, which defaults to false, indicating whether chunk summarization is needed.
- The return type of this method is a string that contains the extracted summary from the video content.
Step 1: Extracting Audio from Video
- The first step involves calling an
extract_audiomethod with two parameters: the video path and audio path.
- An example audio path provided is
TMP audio.wve, which indicates where the extracted audio will be stored in the current directory.
Step 2: Transcribing Audio
- After extracting audio, transcription occurs using a method called
transcribe_audio, which requires both the audio path and model size as inputs.
- A variable named
transcribe_transcriptis created to store the result of this transcription process.
Step 3: Summarizing Transcripts
- This step involves summarizing the transcribed text. It breaks down long transcripts into manageable chunks for easier processing.
- A utility function called
chunk_summarizeis used, which processes each chunk individually before combining them into one final summary string.
Final Steps: Cleaning Up
Video Summarization Process Overview
Execution of the Script
- The final summary is returned after executing the script, which involves fixing and running the execution code.
- The outer function is clarified, indicating that it’s not from
function crl X, and emphasizes the importance of executing the script through a main function.
- The video file, model size, summarize model name, and user checking are passed to execute when triggered by the main process.
Sample Video for Data Extraction
- An example video file is introduced for data extraction purposes; it will serve as a sample for testing.
- A brief description of what a data scientist does is provided: solving problems using data and leveraging various fields like statistics and machine learning.
Running the Main Script
- Instructions are given to run the Python script (
main.py) to extract audio from the provided video file.
- Confirmation that a temporary audio file has been created from the video input, leading to generating a summary.
Summary Output Verification
- A two-line summary is generated from a 60-second video, highlighting key aspects of data science roles and skills utilized in problem-solving.
- The successful execution indicates readiness to integrate with Streamlit for further application development.
Setting Up Streamlit Application
- Installation of Streamlit in the same environment is recommended before proceeding with app development.
- Code snippets are shared for creating an
app.pyscript that imports necessary libraries including Streamlit and defines methods for summarizing videos.
User Interaction in Streamlit App
- The app allows users to upload videos; once uploaded, it processes them in the background to generate summaries.
- A simple button interface triggers summarization processes within the app.
Running Streamlit Application
- Instructions on how to run the Streamlit app (
streamlit run app.py) are provided along with steps for uploading files.