VIDEO HIGHLIGHT

Part 1: How to Build an AI Voice Agent using OpenAI Realtime API

Part 1: How to Build an AI Voice Agent using OpenAI Realtime API

WATCH PART 2: https://youtu.be/ffDm4HVGuTM?si=W1nfLYgj3zsQ0RWW WATCH PART 3: https://youtu.be/oQtBwhRLrT4?si=o56i5609Zp8Ko3eG In this video, I will show you how to build and deploy an AI Voice Agent using OpenAI's new Realtime API (takes 10 min!). This agent will take bookings and send data to Make.com where you can then run any of your other automations. I give you the full code in from my Github Repo. I also show you step-by-step how to set up Replit and how to deploy on Replit so it's always live. I also show you how to plug in Twilio so you can have a phone number that calls your AI agent. I also show you how to connect Make.com. This is a beginner friendly tutorial. 📺 Watch the ENTIRE series: https://www.youtube.com/playlist?list=PLi7jtY2ZZqRYE8Lvw4MuLHTZPYTA4jZHQ 📺 AI SMS Assistant: https://youtu.be/HYPw8TfL2Pg?si=CVAzhuQzsXH5T2Wa 📋 Take This Quick Survey: https://forms.gle/otAr1xUamgyYZE5y7 🛠️ Need this built? Contact: bart@supportlaunchpad.com 🗂️ Github repo: https://github.com/Barty-Bart/openai-realtime-api-voice-assistant 👉 LinkedIn: https://www.linkedin.com/in/bartlomiejslodyczka Learn AI & Coding: Try Scrimba's AI Engineer course (20% off Pro plan with my link): https://v2.scrimba.com/the-ai-engineer-path-c02v?via=BartSlodyczka Other related videos: https://youtu.be/ojV5_IKylEM #openai #realtimeapi #maketutorial #replit Note: Affiliate links support this channel through commissions.

Summary Transcript Chat

Part 1: How to Build an AI Voice Agent using OpenAI Realtime API

How to Build an Inbound AI Voice Agent with OpenAI's Real-Time API

Initial Call and Appointment Booking

The call begins with a customer, Bartholomew, requesting to book a car service appointment.

The AI assistant successfully schedules the appointment for Tuesday at 1 p.m., demonstrating its capability in handling customer inquiries efficiently.

Introduction to OpenAI's Real-Time API

The video introduces the concept of building an inbound AI voice agent using OpenAI's new real-time API, emphasizing its value for speech-to-speech interactions.

Despite concerns about pricing being high for the voice API, there is optimism that costs will decrease over time as adoption increases.

Capabilities of OpenAI's API

The speaker highlights various applications of the OpenAI API, including chat assistants, email automation, data processing, and now voice capabilities.

There is a trend towards creating comprehensive service experiences that cater to both technical and non-technical users looking to integrate AI into their workflows.

Real-Time Communication Mechanism

A key feature of the real-time API is its ability to establish a persistent WebSocket connection for instantaneous communication during conversations.

This mechanism contrasts traditional APIs by eliminating delays in message delivery, allowing for seamless interaction between users and the AI assistant.

Demonstration of Functionality

The speaker mentions creating a GitHub repository with instructions on deploying the AI assistant using Twilio and connecting it with the real-time API.

The assistant captures essential customer information such as name and service needs during phone calls, showcasing adaptability for various business requirements.

Integration with Other Tools

Integration with platforms like Make.com allows data from phone calls to be utilized across different tools and services, enhancing workflow automation.

Session Management in Twilio and Replit

Overview of Session Management

The session management feature allows handling multiple phone calls simultaneously without mixing up information between different users.

Each call is treated separately, ensuring that user data remains distinct and organized, which is crucial for businesses managing various client interactions.

Building the Application

The speaker discusses using a GitHub repository to run the application on Replit, emphasizing ease of integration with Twilio.

Initial development was inspired by Twilio's YouTube video introducing their real-time API, utilizing JavaScript and Node.js for deployment.

Code Explanation and Deployment

The original code from Twilio’s GitHub repo serves as a foundation; however, it requires additional steps for deployment that are simplified when using Replit.

The speaker highlights the importance of crediting Twilio's resources while explaining modifications made to enhance functionality in their own version.

Setting Up on Replit

To launch the project on Replit, users can import code directly from GitHub, streamlining the setup process significantly.

After importing, users need to install dependencies via npm to ensure all necessary packages are available for running the application.

Finalizing Configuration

Users must create an .env file to store sensitive information like API keys securely. This step is essential for maintaining security during app operation.

How to Set Up Twilio for AI Integration

Initial Setup and URL Configuration

The speaker initiates the process by running a test, confirming that the expected message will be printed when the application is online. A specific URL is designated for testing purposes.

The index file utilizes preset configurations, including "B Automotive." The focus shifts to setting up Twilio for phone call testing and integrating it with an AI agent.

Creating a Twilio Account

Users are guided to create a free account on Twilio's website, which provides a US-based phone number and $15 credit for initial usage.

To add new numbers, users must upgrade their accounts with a minimum deposit of $20. The speaker mentions having upgraded their account to obtain Australian numbers.

Managing Phone Numbers in Twilio

Users navigate through Twilio's interface to manage active phone numbers. It's essential that the default number has voice capability indicated by a phone icon.

After selecting an active number, users are instructed to access its configuration settings, ensuring they choose options related to webhooks.

Configuring Webhooks for Incoming Calls

The configuration involves setting up a webhook that triggers when someone calls the designated phone number. This requires making an API call to connect with the real-time API.

Users copy the relevant URL from their application (Repet), pasting it into Twilio’s webhook settings while appending "/incoming-call" at the end of this URL.

Finalizing Configuration and Testing

The importance of appending "/incoming-call" is explained; it directs incoming calls specifically within the application code.

After saving configurations in Twilio, users are prompted to run their application again so it's accessible via the internet.

Conducting Test Calls

With everything set up, users can make test calls while monitoring logs in Repet. Logs will display incoming call notifications and conversation transcripts as they occur.

An example interaction demonstrates how an AI agent responds during a simulated service scheduling call, showcasing its ability to handle customer requests effectively.

AI Receptionist Implementation

Overview of the AI Receptionist Setup

The conversation is being transcribed in real-time, allowing for a complete text record of interactions. This facilitates integration with a completion API call.

A successful phone call was established; however, an invalid URL for the webhook was identified, which needed correction to ensure proper data transfer to make.com.

Code Configuration and Functionality

The code setup involves defining the system message for the AI receptionist, instructing it to gather user information such as name and service requirements through conversational prompts.

Voice options are available in OpenAI's playground; users can select their preferred voice (e.g., Alloy, Eco, Shimmer) for the AI assistant's responses.

Event Logging and Transcription

Events during the phone call are logged to capture both user and agent transcripts. This data is essential for later analysis and processing.

The implementation includes live audio transcription using Whisper model technology, enabling real-time conversion of spoken dialogue into text.

Data Handling and Integration with Google Sheets

A ChatGPT completions API call is made to extract structured information from the transcript, including customer name, availability, and special notes.

After processing the data through make.com via a webhook URL, relevant details are automatically populated into designated fields in Google Sheets.

Testing and Results

Turbo Upgrade and Deployment Process

Overview of the Turbo Upgrade

The speaker discusses a Turbo upgrade package, indicating they are scheduled for a full engine rebuild with a Twin Turbo upgrade next Tuesday at 1 p.m.

Utilizing ChatGPT for Code Edits

The speaker explains how to use ChatGPT to make edits to the code, such as changing system messages or web hook URLs. This integration can enhance functionality by allowing users to adapt the code for different applications.

Deployment Setup

Instructions are provided on deploying the application, emphasizing that it is currently in testing mode and needs to be switched to live deployment. The process involves clicking a button and configuring settings.

The speaker notes that setting up the server backend is straightforward, requiring only an API key and web hook URL insertion into an environment file (EnV file).

Finalizing Deployment

After deployment, users must replace the initial URL with a new one generated during setup. A reminder is given about ensuring only one forward slash is present when copying URLs.

Once saved, the configuration should be tested by making a call to ensure that the live version of the AI agent functions correctly.

Future Enhancements

The speaker reflects on their quick build process aimed at helping others save time. They mention potential future improvements like integrating a knowledge base and function calls for checking calendar availability or answering FAQs.