ElevenLabs' Mati Staniszewski: How Voice Becomes the Interface for AI
The Human Story Behind 11 Labs
Founding and Friendship
- The story of 11 Labs began in 2022, rooted in a long-standing friendship between the founder and his co-founder, P. They met in high school and have remained best friends while pursuing their academic and professional journeys together.
Inspiration from Polish Media
- Both founders hail from Poland, where they experienced a unique audio issue: foreign films are dubbed with a single monotone voice. This inspired them to address the lack of emotional depth in audio narration.
Identifying Audio Challenges
- The founders recognized that the audio domain had significant gaps, including language barriers and limited access to audiobooks or news articles. They aimed to enhance how voices convey emotions across various media.
Building Frontier Models for Audio
Unique Approach to Development
- Unlike many companies that require massive funding upfront, 11 Labs started with a focus on audio—a niche area at the time—allowing them to innovate without following conventional paths.
Team Assembly Strategy
- The company adopted a remote-first approach, hiring top researchers based on their work rather than location. This strategy helped assemble a talented team dedicated to advancing audio technology.
Monetization Strategy
- Early monetization was prioritized to sustain operations and fund model development. This independence allowed for continuous investment into improving their models while maintaining healthy margins.
Suite of Models Developed by 11 Labs
Initial Model Focus
- Their first model was text-to-speech (TTS), designed to understand context and convey appropriate emotions through speech synthesis, addressing language barriers effectively.
Expanding Capabilities
- Following TTS, they developed speech-to-text capabilities for accurate transcription. These foundational models were combined over time to create more complex interactive experiences.
Key Moments of Innovation
First "Wow" Moment
- A significant milestone occurred when they successfully replicated the founder's voice using AI technology, leading to an emotional realization about the accuracy of AI-generated voices.
Emotional Intelligence Breakthrough
- Recent advancements include developing emotional intelligence within voice agents that can adapt responses based on user emotions—an important step towards creating more human-like interactions.
Opportunities in Voice Agents
Current Applications
- While customer support is widely recognized as an application for voice agents, there’s growing interest in revenue-generating roles such as sales calls where agents can streamline processes efficiently.
Overlooked Areas for Development
- Potential exists in citizen support services where voice agents could provide essential information during crises or assist with government inquiries—demonstrated by initiatives like those seen in Ukraine during wartime.
Lessons for Startup Founders
Team Structure Insights
- Despite rapid growth (over 400 employees), 11 Labs maintains small teams (less than ten members each), promoting agility and quick decision-making across departments by integrating engineers into non-tech teams.
Emphasis on Technical Skills Across Teams
- Each team includes technical talent which enhances productivity through automation and upskilling non-engineering staff—this has proven beneficial amid increasing demands for coding skills across all sectors.
Insights on Audio Models and Their Development
The Role of Artistry in Audio Models
- Jensen's perspective highlights the distinction between technology and artistry in audio models, emphasizing that speech-to-text is a technological achievement while text-to-speech embodies artistic expression. This insight underscores the creative aspect of developing audio models.
Importance of User Engagement
- To enhance audio model performance, it is crucial to engage with users directly, gather their preferences, and utilize this data for fine-tuning. Different domains such as healthcare, finance, and education require tailored approaches when deploying these models.
Continuous Improvement through Quality Focus
- A commitment to quality in model development can provide a competitive edge. Understanding user problems and workflows is essential for creating effective solutions beyond just research-focused efforts.
Integration of Knowledge with Audio Models
- Successful voice agents combine audio models with knowledge systems to facilitate multi-channel interactions. Evaluating and monitoring these integrations are vital for enhancing user experience across various applications.
Building Trust through Diverse Offerings
- The establishment of a trusted platform involves providing templates for agent creation and workflows in creative spaces. With over 20,000 voices contributed by users, catering to diverse language styles becomes increasingly important for accessibility and ease of use.