AWS re:Invent 2025 - Build a multi-channel agentic experience with Twilio and AWS (AIM236)
Build a Multichannel Agentic Experience with Twilio and AWS
Introduction to the Collaborative Effort
- Welcome message from the speakers, highlighting the collaboration between Twilio and AWS.
- The team consists of Dan Bartlett (Principal Solutions Architect at Twilio), Brandon Hawkins (Lead PM for Voice AI at Twilio), and Cameron Keane (Solution Architect at AWS).
Understanding Agentic Experiences
- The term "agentic" refers to creating applications that allow users to interact through their preferred channels, such as voice, SMS messaging, and email.
- Emphasis on enabling users to switch seamlessly between different communication channels within an application.
Architectural Overview
- A simple architectural diagram illustrates customer interactions via voice, email, and messaging backed by a large language model (LLM).
- Key principles for multichannel agentic applications include:
- Channel Choice: Users should have options among various communication methods.
- Provider Flexibility: Importance of choosing appropriate speech-to-text and text-to-speech providers based on specific needs.
- Reusable Components: Encouragement to build applications using reusable components across different channels and models.
Conceptual Framework for Application Development
- The framework starts with user interaction followed by channel selection (voice, messaging, email) leading into an API layer for bidirectional communication.
- Central focus is on the agent application where business logic resides; identity resolution and context are crucial in this layer.
- Discussion of tools available for building agentic experiences along with data products from AWS integrated into the solution architecture.
Challenges in Building Engagement
- Transitioning to challenges faced by businesses in creating effective engagement strategies:
- Evolving consumer expectations demand continuity across multiple channels but face obstacles due to data silos.
- Fragmented experiences arise when switching between isolated channels disrupt user experience continuity.
- Difficulty in maintaining long-term customer connections due to ununified signals from various sources like customers, channels, and AI systems.
Why Twilio?
Twilio's Vision and Market Position
- Twilio envisions a world where every digital interaction is exceptional, positioning itself as a key player in the communications ecosystem.
- As the market leader in communications, Twilio differentiates itself from foundational AI companies like Amazon by focusing on being a developer platform that partners with them for communication solutions.
- The importance of contextual data for personalization at scale is emphasized, highlighting Twilio's ownership of Segment, a Customer Data Platform (CDP).
- Trust from hundreds of thousands of customers underscores Twilio’s reliability in managing communications with 99.999% uptime during peak periods.
- With nearly 5,000 global carrier connections, Twilio boasts extensive partnerships to facilitate message delivery across various channels.
Exploring Communication Channels
- The discussion will cover three main channels: voice, messaging, and email, starting with voice as the most popular channel for creating engaging experiences.
- The Twilio logo symbolizes its origins in voice technology since its founding in 2008, emphasizing its long-standing expertise in enabling voice calls over software.
Voice Technology Capabilities
- Twilio supports multiple voice channels including SIP PBX systems and Public Switched Telephone Network (PSTN), ensuring comprehensive coverage for enterprises.
- WebRTC capabilities allow users to initiate calls directly from websites, enhancing user engagement through interactive features.
Challenges in Voice Applications
- Despite claims that "voice is dead," there has been a resurgence due to advancements like ChatGPT; however, challenges such as latency remain critical concerns for developers.
- Latency issues can lead to awkward interactions if response times exceed one second; both subjective and objective factors contribute to this challenge.
Solutions Offered by Twilio
- Inflexibility among existing solutions leads customers to seek more control over their applications; Twilio aims to provide this flexibility through its developer platform approach.
- Orchestration is vital for effective communication; handling interruptions and accurate turn-taking are essential components that must be managed effectively within conversations.
- Introducing Conversation Relay allows users to integrate their own AI into the Twilio voice framework while addressing common pain points experienced by developers.
How to Build Voice Agents with Twilio
Overview of Twilio's Voice Agent Capabilities
- Twilio's voice agent product became generally available in March and is experiencing significant demand, marking it as one of the fastest-maturing products in Twilio's history.
- The platform offers a low-latency solution for building voice agents, allowing businesses to focus on their unique differentiators while Twilio manages orchestration and other backend processes.
Architectural Insights
- An oversimplified architectural diagram illustrates the interaction between Twilio (left side) and the user (right side), facilitated by a web socket during live phone calls.
- Voice transcription occurs in real-time, sending text over the web socket for processing through an LLM or data dips before returning responses back into the call with median latency under 0.5 seconds.
Developer Experience and Control
- Developers are encouraged to leverage Twilio for non-core tasks like orchestrating voice while retaining control over differentiated aspects of their applications.
- Users have full control over data location, which is crucial for compliance and privacy considerations.
Vendor Strategy for Speech Technologies
- The choice of speech-to-text (STT) and text-to-speech (TTS) vendors is strategic; instead of using many vendors, Twilio focuses on a few key players that can scale effectively.
- Partnerships with major hyperscalers like Google and Amazon, along with promising startups such as Deepgram and ElevenLabs, allow for better pricing and influence over technology roadmaps.
Latency Management
- By co-locating services within its infrastructure, Twilio minimizes latency by avoiding additional hops to external clouds during calls.
- This unique positioning enables lower latency than competitors since calls remain within the Twilio ecosystem throughout processing.
Building with TwiML
- Developers familiar with TwiML (Twilio Markup Language), which configures pre-run time settings such as provider selection and language detection, will find it essential in setting up voice agents.
- Real-time interactions are managed via a service provider interface (SPI), allowing dynamic adjustments during calls based on events like interruptions or language changes.
Fine-Tuning Capabilities
- The platform includes fine-tuning controls that enable customization of pronunciation for names or terms specific to different regions or languages.
- For example, correct pronunciation adjustments ensure accurate representation of names like "Sine" in Irish contexts rather than common mispronunciations.
This structured overview captures key insights from the discussion about building voice agents using Twilio’s platform. Each point links directly to relevant timestamps for further exploration.
Observability in Voice Agent Automation
Importance of Observability
- Discusses the significance of observability after deploying a voice agent, emphasizing the need to understand its performance and interactions with customers.
- Introduces two pillars of observability: basic metrics (latency, jitter) essential for debugging and enhancing user experience, and higher-level business intelligence (customer sentiment, bot hallucinations).
Twilio's Products for Observability
- Highlights Voice Insights, Twilio's flagship troubleshooting product that evolves from basic metrics to advanced features like aggregation and trend analysis across various calls.
- Mentions the integration of end-to-end latency measurement capabilities within Twilio’s offerings, available through conversations relayed directly.
Conversational Intelligence
- Introduces Conversational Intelligence, a unified omni-channel intelligence solution that addresses previous data silos by combining messaging insights, voice insights, and voice intelligence into one product.
- Explains how this tool assesses customer interactions—such as transfer requests or bot errors—using pre-built language operators alongside custom options.
Real-Time Capabilities
- Describes the necessity of having transcripts for effective analysis; emphasizes that real-time processing is now possible in pilot mode with plans for broader availability next year.
- Discusses the potential impact of real-time language operators on automating AI experiences and conducting A/B testing against human agents.
Future Directions: Speech-to-Speech Models
- Addresses inquiries about speech-to-speech models versus traditional text-based systems; expresses excitement about advancements while acknowledging current limitations.
- Explains the cascade model architecture used in text-to-speech/speech-to-text systems compared to emerging speech-to-speech models which promise lower latency by eliminating intermediary steps.
Challenges Ahead
- Notes that while speech-to-speech technology preserves tone and emotion better by processing audio directly, it is still early days for production use at scale.
- Cautions against rushing into deployment due to challenges such as reliance on text for control over bot responses and potential costs associated with new technologies.
Conclusion on Current Technology Readiness
- Concludes that although there is enthusiasm around new developments in voice AI agents, businesses should remain cautious until these technologies are fully matured.
- Emphasizes flexibility in choosing vendors when using traditional cascade models versus being locked into specific ecosystems with speech-to-speech solutions.
High-Level Architecture of Conversation Relay
Overview of the Architecture
- The architecture involves a call from the Twilio platform, which initiates a conversation relay session through web socket speech-to-text technology.
- Text-to-speech is integrated into the system, allowing for context and interaction with prompts via Bedrock, facilitating seamless communication back to the conversation relay.
- The architecture supports various compute options like EC2 and containers, providing flexibility in building custom solutions.
Demonstration of Application
- Cameron introduces a conversational helper bot designed to provide information about re:Invent events.
- The bot engages with users by answering questions about specific events such as the Replay Party, showcasing its ability to retrieve detailed information.
Interactive Features of the Bot
User Interaction Examples
- The bot provides comprehensive details about the Replay Party, including entertainment options and amenities available at the event.
- Users can request additional information on learning sessions before attending events; for example, asking specifically about Mandalay Bay's schedule.
Session Information Retrieval
- The bot efficiently retrieves session details happening at Mandalay Bay on Thursday, December 4th, demonstrating its capability to handle multiple queries seamlessly.
- Users can request summaries or have information sent via text message without interrupting ongoing voice conversations.
Multichannel Experience and Agent Functionality
Seamless Multichannel Communication
- The demonstration highlights dual-channel streaming between speech-to-text and text-to-speech using an AI agent running on EC2 with Bedrock integration.
- Users experience sub-second latency while interacting through voice commands that trigger responses across different channels (voice, SMS).
Agent Capabilities
- Agents are programmed with specific goals and instructions to assist users effectively during interactions related to re:Invent events.
- Multiple communication tools (SMS, voice calls, email) are utilized by agents to enhance user experience without disrupting primary conversations.
Architecture Diagram Insights
Tools and Knowledge Base Integration
- A high-level architecture diagram illustrates four distinct communication channel tools that allow users to interact flexibly based on their preferences.
- Separate tools for each communication channel enable more efficient processing rather than integrating them within a single agent framework.
Achieving Low Latency in Multi-Agent Systems
Challenges of Asynchronous Tool Calls
- Achieving sub-second latency is identified as a primary challenge in multi-agent systems. The approach involves defining tools as asynchronous tool calls to prevent blocking the main communication channel.
- Managing context and orchestrating collaboration between multiple agents adds complexity, necessitating the use of open-source frameworks.
Open Source Frameworks for Agent Management
- Frameworks like Strands, LangChain, LangGraph, and Crew AI are introduced as solutions that alleviate the burden of writing manual code for managing agent interactions.
- These frameworks enable users to build agents capable of performing various tasks efficiently.
Hosting and Scaling with AgentCore
- AgentCore is presented as a solution for hosting agent architectures at scale. Users can choose to run their agents on self-hosted infrastructure or utilize AgentCore's capabilities.
- The introduction of dual-channel streaming via web sockets enhances low-latency connections within the architecture.
Memory Management in Agents
- Memory management features include short-term memory for sessions lasting less than 30 days and long-term memory that summarizes conversations after this period.
- This system ensures persistent user context while providing observability into agent performance metrics such as token usage and latency.
Integration with Existing Functionality
- The AgentCore gateway allows integration with existing APIs and Lambda functions without requiring code rewrites, facilitating seamless functionality enhancement.
Messaging Capabilities Demonstrated
- A practical example illustrates how voice conversations can transition to text messaging while preserving context.
- An inquiry about directions leads to an immediate response regarding shuttle services from re:Invent to the Replay Party, showcasing real-time assistance capabilities.
Enhancing Messaging Solutions
Importance of SMS and Compliance Challenges
- The discussion transitions to messaging solutions where SMS is highlighted alongside compliance complexities that require reliable partnerships for effective implementation.
Twilio's Role in Messaging Scalability
- Twilio's capabilities are emphasized in handling large-scale messaging volumes effectively during events like Cyber Week.
Introduction of Rich Communication Services (RCS)
- RCS is introduced as an advanced messaging service offering branding verification and rich content delivery directly through native applications on iOS or Android devices.
Features of RCS Explained
- Key features include enhanced branding visibility through logos and verified senders, along with support for rich media content such as videos and interactive elements.
Enhancing User Experience with RCS and Messaging Channels
The Impact of RCS on User Experience
- RCS (Rich Communication Services) significantly enhances user experience by introducing interactive elements like quick reply buttons, carousels, and widgets such as Wallet and Calendar.
- Combining these features with agentic capabilities allows agents to send rich content, enhancing the interactivity of conversations across various platforms.
Multi-Channel Messaging Capabilities
- Twilio supports both RCS and WhatsApp, enabling businesses to leverage multiple messaging channels for better customer engagement.
- Businesses should offer preferred messaging options like WhatsApp while having fallback options like SMS or RCS where available.
Email as a Key Channel in Agentic Applications
- Despite perceptions of email being outdated, it remains a vital channel for communication due to its ubiquity and ability to handle richer input/output.
- Email is positioned to play a significant role in agentic applications alongside voice and messaging channels.
Demonstrating Email Functionality
- A demo showcases how email can be used effectively within an agentic application context, highlighting its integration with other communication forms.
Context Preservation Across Channels
- The demo illustrates how session history is maintained across different communication channels (voice, SMS, email), allowing users to seamlessly continue their inquiries without losing context.
- This continuity emphasizes the importance of preserving conversation context for improved user interactions across all platforms.
Building Advanced Communication Solutions
- Utilizing Twilio's platform along with AWS tools enables developers to create sophisticated communication solutions that integrate various channels effectively.
Architecture Overview and Integration with Twilio
Introduction to the Architecture
- The architecture being discussed is designed for scalability and ease of use, with plans to make it available for others to implement.
- Initial setup involves configuring domain settings and secure web socket connections necessary for programmable voice integration.
Networking and Traffic Management
- Traffic management utilizes an internet gateway with load balancing across two availability zones, enhancing latency by co-locating resources near customers.
- EC2 instances within auto-scaling groups handle server operations, utilizing web sockets for dual-channel streaming through conversation relay.
Data Handling and Storage
- Speech-to-text processing occurs via Bedrock, which hosts agents that interact with a knowledge base stored in S3 buckets as individual JSON objects.
- Session persistence is managed using DynamoDB to maintain user conversations and history, while knowledge base integration is facilitated through asynchronous tool calls.
Frontend Implementation
- A simple React application hosted on Amplify streams audio using web sockets, demonstrating the frontend capabilities of the architecture.
Leveraging Twilio's Capabilities
Twilio Platform Features
- Twilio offers extensive platform capabilities including security compliance dashboards and voice insights applicable across all communication channels.
- The global reach of Twilio enhances enterprise-grade tools available for various communication methods like programmable voice (telephone numbers, WebRTC, SIP).
Unified Communication Channels
- Programmable messaging APIs allow seamless access to multiple channels (SMS, WhatsApp, RCS), enabling developers to write code once while leveraging diverse communication options.
- Email services are integrated through SendGrid within the same platform framework as other messaging services.
Simplified Architecture Benefits
- All interactions from inbound communications (calls/messages/emails) funnel through a single endpoint in a simplified architecture that supports complex applications efficiently.
Conclusion: Collaboration Between AWS and Twilio
Key Takeaways from the Presentation
- Delivering scalable communications requires collaboration between platforms like AWS and Twilio; their combined efforts simplify complex architectures.
- Future developments will be shared via blog posts allowing users to build similar systems independently; attendees are encouraged to visit booths for more information.