# 182 Monsters, Aliens, and Lip-Synced Dubbing with MARZ CEO Jonathan Bronfman

Name: # 182 Monsters, Aliens, and Lip-Synced Dubbing with MARZ CEO Jonathan Bronfman
Uploaded: 2023-09-07T13:50:17.000Z
Duration: 1 h 34 s

Introduction to MARZ and Lip Manipulation Technology

Challenges in Traditional Visual Effects

Traditional lip manipulation in visual effects is deemed impossible due to the complexity of facial movements, which often fall into the "uncanny valley."

Overview of MARZ

Jonathan Bronfman, CEO of Monsters Aliens Robots Zombies (MARZ), discusses the company's focus on AI-enabled visual effects since its launch in 2018.

MARZ has worked on over 100 high-profile TV projects including Umbrella Academy, Moon Knight, and Wandavision.

Founding Principles of MARZ

Founded in August 2018, MARZ aims to differentiate itself within a homogeneous visual effects market by focusing on premium episodic content.

The company prioritizes quantifiable metrics such as time and cost rather than subjective quality assessments, recognizing constraints in television production budgets and timelines.

Embracing AI for Differentiation

To stand out from competitors, MARZ decided to integrate AI technology into their processes, although initially uncertain about specific use cases.

The team identified opportunities where deep learning could enhance visual effects applications in Hollywood.

Innovative Products: Vanity and LipDub

Vanity: Digital Makeup Solution

The first product developed was Vanity, a digital makeup tool that allows for cosmetic enhancements like softening wrinkles or de-aging faces efficiently.

Traditionally requiring half a day to two days for an artist to beautify a face shot, Vanity reduces this time significantly to just 20 minutes.

Efficiency Through Technology

Each second of film consists of 24 frames; traditional methods require extensive manual work across all frames. In contrast, Vanity simplifies this process by allowing artists to work on only a few images while extrapolating results across others.

Introduction of LipDub

Following the success with Vanity, MARZ developed LipDub as another application leveraging their expertise in facial nuances through machine learning. This product was announced approximately 120 days prior to the discussion.

Current Status and Future Outlook

Company Growth Without Additional Funding

Despite raising funds back in 2021, MARZ has continued gaining traction without seeking additional funding since then.

Business Model Overview

Components of the Business

The business consists of two main components: traditional visual effects and machine learning research and development.

The capital raised in 2021 was intended for the entire business, with current operations being capital efficient as traditional work finances R&D.

Future aspirations include having the R&D side support the visual effects side financially.

Industry Challenges

The industry is currently facing significant strain due to ongoing negotiations among actors, writers, and studios.

Workforce reduction has occurred, dropping from about 300 employees due to a lack of available work.

Anticipation exists for a resolution in negotiations to resume growth.

LipDub AI Development

Origin and Technology Focus

LipDub AI emerged from expertise in facial animation; it became clear that dubbing was a logical next step after mastering facial manipulation.

Traditional lip manipulation in visual effects is challenging due to the uncanny valley effect and high costs associated with achieving photorealism.

Functionality and Quality Standards

LipDub AI processes audio stems alongside original footage through software, yielding results within 30 seconds.

The technology aims for Hollywood standards by working with high-resolution (4K ProRes files), differentiating itself from consumer-grade solutions.

Market Positioning and Client Strategy

Research Advantage

A competitive edge lies in fully automated solutions that integrate seamlessly into existing production workflows without adding friction.

Client Engagement

While specific client details are confidential, Hollywood remains the primary target market for their technology.

Feedback and Future Directions

Reception from Studios

Initial reception has been positive; however, there are still unresolved edge cases that need addressing before full deployment.

Market Suitability Considerations

Adoption Curve of Content and Linguistic Challenges

Overview of Content Adoption

The discussion revolves around the adoption curve for various types of content, with a focus on major features like "Oppenheimer" and "Barbie." The speaker anticipates that within six months to a year, current limitations in their product will be resolved.

Financial Considerations in Remastering

A key point raised is the financial aspect studios must consider when remastering or visually dubbing content. The cost-effectiveness of dubbing films like "The Avengers" into languages such as Mandarin is crucial for studios.

Linguistic Challenges in Dubbing

The need for a pricing model that appeals to all segments of film production is emphasized, highlighting the importance of addressing linguistic challenges in dubbing across different languages.

Language Agnosticism and Specificity

The technology discussed is described as language agnostic, relying on universal sounds and lip movements. However, specific language nuances are necessary for authentic results.

For instance, while dubbing "Avengers" into Mandarin may appear visually appealing, native speakers emphasize the need for language-specific adjustments to ensure authenticity.

Expressiveness Across Languages

Differences in expressiveness between languages are noted; Mandarin speakers tend to have less pronounced lip movements compared to Spanish speakers. This necessitates incorporating specific data related to each language's characteristics.

Ethical Use of AI in Dubbing

Commitment to Ethical Standards

The speaker emphasizes their commitment to ethical AI use, rejecting requests that would misrepresent individuals or manipulate speech unethically.

Enhancing Global Engagement

There’s an intention behind using this technology: making foreign content more engaging for international audiences while ensuring actors' interests are considered.

Actors' Perspectives on Dubbing Technology

Actors’ Reactions and Motivations

Feedback from actors regarding this technology varies; many see it as an opportunity for increased revenue rather than purely artistic concerns.

Distinction Between Creation and Augmentation

A distinction is made between creating digital avatars versus augmenting existing footage. The software focuses on enhancing mouth movements rather than complete facial reconstruction.

Potential Benefits for Studios and Actors

Revenue Growth through Localization

Discussion on Likeness Rights and Technology in Dubbing

Ethical Considerations in Using Likeness

The speaker emphasizes the importance of not using individuals' likenesses without contractual rights, advocating for transparency and ethical practices in technology applications.

They suggest that while their company is part of ongoing negotiations with WGA and SAG, there are various technological solutions available to enhance projects.

Enhancements in Dubbing Technology

The discussion highlights advancements such as deep fakes and full face replacements aimed at augmenting performances rather than replacing them, enhancing viewer engagement.

A survey by Netflix indicates a preference for dubbed audio over subtitles, despite self-reported preferences suggesting otherwise; backend analytics show 90% favoring audio dub.

Synchronization Challenges

As language proficiency improves, viewers notice discrepancies between dubbed audio and original performances, highlighting the need for better synchronization techniques.

The manipulation focuses primarily on the lower half of the face to maintain natural expressions while ensuring lip movements match emotional tones conveyed by voice actors.

Artistry in Dubbing

Dubbing is described as an art form where script revisions ensure sentiment is accurately conveyed across languages; successful examples like "Squid Game" demonstrate effective dubbing practices.

Technical Limitations and Future Directions

The company does not intersect with translation services but focuses solely on syncing voice tracks provided by clients; they currently do not work with animated content due to technical constraints.

There are advanced synthetic voice companies available for audio production; however, this company specializes exclusively in facial localization challenges rather than voice synthesis.

Research Developments in Localization Technology

The speaker notes that their research team has been focused on localization since 2019, indicating a steady progression rather than a reactionary response to recent AI trends.

Product Development Insights

Breakthrough Moment in March

The speaker notes a significant breakthrough that occurred around March, marking a pivotal moment after two years of development. This moment led to substantial progress in various aspects such as preprocessing, tracking, and integration.

The culmination of this effort resulted in the release of their product, indicating a successful transition from concept to execution.

Commitment to Continuous Improvement

The speaker emphasizes an ongoing commitment to research and improvement, suggesting that achieving the highest quality standards is a long-term endeavor.

They express a focus on creating Hollywood-grade content, indicating that they are not currently planning to target the creator side or platforms like YouTube.

Challenges with Creator Market

There is skepticism regarding creators' willingness to pay for high-quality localization services. The example of MrBeast is mentioned as someone who has been exploring this area but faces challenges in monetization.

The speaker suggests that platforms like YouTube may be better suited for addressing creator needs rather than their own company pursuing this market.

Differentiation Through Quality Standards

A distinction is made between the quality standards required for Hollywood-level content versus those acceptable for social media platforms like TikTok or YouTube.