The truth behind AI voice mode #ai #chatgpt #tech
The Impact of AI Voice Models on Perception
Overview of Current AI Voice Technology
- Discussion begins with a humorous reference to a viral video about eating a hot dog, highlighting the disconnect between user experience and AI performance.
- The voice mode in ChatGPT operates on GPT-4, which was released in May 2024, while the text version has advanced to GPT-5.5.
Cost Implications of AI Model Choices
- OpenAI's decision to retain an older voice model is primarily driven by cost; audio tokens are significantly more expensive than text tokens (6 to 8 times).
- This financial strategy leads to using cheaper models for voice interactions, despite newer models being available for developers willing to invest.
Brand Perception and Market Competition
- The outdated voice model negatively impacts public perception; users associate poor responses with the overall intelligence of ChatGPT rather than understanding the underlying budget constraints.
- Competitors like Google and Anthropic are actively improving their voice technologies, putting OpenAI at risk of losing market relevance due to its cost-saving measures.