AppTek GmbH Managing Director Volker Steinbiss on Why AI Dubbing Is So Hard

Name: AppTek GmbH Managing Director Volker Steinbiss on Why AI Dubbing Is So Hard
Uploaded: 2025-06-06T12:25:11.000Z
Duration: 22 min 19 s

Introduction to AppTech AI

Overview of AppTech AI and its Research

Fresh Steinbees, managing director of AppTech AI, discusses the company's focus on language technology and machine learning.

AppTech has been operational for 30 years, with significant contributions from Professor Ham Nai's research group in machine translation and speech recognition.

The company emphasizes evaluation-driven, high-level scientific research that is customer-focused due to its smaller size.

AppTech covers various fields including speech recognition, machine translation, LLMs (Large Language Models), text-to-speech, and natural language processing under one roof.

The team consists of diverse disciplines working closely together, focusing primarily on audio-related technologies.

Challenges in AI Dubbing

Unique Positioning of AppTech in the Market

The company attracts talent by offering freedom to work on complex problems within a collaborative environment.

Many companies in the AI dubbing space integrate technologies from other vendors but do not own their core technology; AppTech differentiates itself by owning and deeply understanding its technology stack.

This deep technological expertise allows for fine-tuning solutions rather than relying on off-the-shelf products.

Data Collection and Legal Compliance

AppTech has an organization dedicated to collecting data legally sourced, which provides a competitive advantage in professional settings.

Technical Challenges in AI Dubbing

Complexity of the Dubbing Pipeline

Steinbees highlights that even basic AI dubbing presents significant challenges due to the complexity of the pipeline involved.

Mistakes made early in the process can propagate through the entire system, complicating outcomes significantly.

Addressing Emotion and Proximity Issues

Current approaches often utilize a pipeline method where audio is transcribed into text before being translated; however, this can lead to loss of emotional nuance and speaker identity during translation.

Exploring the Future of Language Processing

Approaches to Understanding Emotion in Language

The discussion begins with various approaches to understanding emotion, highlighting three levels of happiness and the potential for automatic processing using a continuum.

The complexity of language stress is noted, particularly how pitch indicates stress in English, contrasting with tonal languages like Chinese where pitch serves different functions.

Challenges in Language Translation

Significant differences between languages can complicate translation tasks; copying prosodic cues may be a solution depending on the specific language pairs involved.

Bridging the language gap remains a key goal, especially for audio translations. Current capabilities exist for text but not yet fully realized for spoken language.

Vision for Accessibility and Inclusion

The speaker emphasizes the importance of making audio content accessible, particularly for individuals who are hard of hearing or visually impaired. Collaborations with institutions like Gallaudet University aim to address these challenges.

There is a vision to ensure all video content globally is AI-accessible, enhancing inclusivity across diverse audiences.

Quality Concerns in AI Systems

A cautionary note is raised regarding free systems that may compromise quality; there’s concern that users might become accustomed to lower standards.

Emphasis on maintaining high-quality standards in AI-generated content is crucial. Suggestions include implementing minimum quality standards and labeling systems to distinguish between human-generated and automated outputs.

Future Directions and Ethical Considerations

Channel: Slator

Video description

At SlatorCon London 2025, Slator's Silvia Terribile interviews AppTek GmbH Managing Director Volker Steinbiss on AppTek’s deep in-house expertise in speech and language AI, its end-to-end AI dubbing pipeline, the technical and emotional challenges of preserving prosody in translated audio, and the importance of maintaining high-quality standards as the industry evolves. AppTek: https://www.apptek.ai/ Keep an eye out for future events: https://slator.com/events/ WHERE TO FOLLOW US LinkedIn: https://www.linkedin.com/company/slator/ Twitter/X: https://twitter.com/slatornews Facebook: https://www.facebook.com/slatornews/ YouTube: https://www.youtube.com/c/slator Website: https://slator.com/ Newsletter: http://eepurl.com/c9dYQ5 LEARN ABOUT THE LANGUAGE INDUSTRY News: https://slator.com/news/ Resources: https://slator.com/resources/ Research and Reports: https://slator.com/slator-reports/ Advisory: https://slator.com/slator-advisory/ Subscriptions: https://slator.com/subscribe/ Advertising: https://slator.com/advertising-with-slator/