# 163 The Future of Live Multilingual Captioning Ai-Media CEO Tony Abrahams

Name: # 163 The Future of Live Multilingual Captioning Ai-Media CEO Tony Abrahams
Uploaded: 2024-03-19T12:28:41.697Z
Duration: 1 h 30 min 25 s

AI-Media: Captioning, Subtitling and Translation Provider

In this podcast episode, Tony Abrahams, the CEO and co-founder of Ai-Media, discusses how his company provides captioning, subtitling and translation services. He talks about how he got involved in language access services and the importance of captions for people with hearing impairments as well as those who speak a language other than English.

Background

Tony Abrahams is the CEO and co-founder of Ai-Media.

Ai-Media is a New York-based captioning, subtitling and translation provider that was founded in Australia.

Tony's background is in economics, management consulting and finance.

After quitting his job in finance, Tony met Brian Walsh from Foxtel at a cocktail party. Brian asked if they could provide captions to their platform.

Importance of Captions

Alex Jones, Tony's partner at the time, is profoundly deaf. He explained to Brian Walsh that he wouldn't pay $100 a month for Foxtel because he couldn't understand what was being said without captions.

Over 50% of millennials and Gen Z watch everything with captions. Captions are not only beneficial for people with hearing impairments but also for those who speak a language other than English.

Captions help improve comprehension for everybody.

Evolution of Ai-Media

After doing a six-month consulting engagement with Foxtel, they decided to introduce captions to their platform. They invited Ai-Media to set up a company and tender for that work.

Initially providing recorded media services for the pay-TV industry in Australia, Ai-Media quickly realized that the opportunity was much bigger than that. They needed to focus not just on recorded media but also on live content.

Ai-Media realized that they needed to be doing this not just in entertainment and media but also in education, where providing access to grade school and high school is far more important.

Conclusion

Ai-Media provides captioning, subtitling and translation services. Captions are beneficial not only for people with hearing impairments but also for those who speak a language other than English. Ai-Media started by providing recorded media services for the pay-TV industry in Australia before realizing that they needed to focus on live content as well. They now provide their services across various industries including education.

AI and Respeaking

In this section, Tony Abrahams talks about how they used AI in the early days to have a respeaker as an intermediary in that live process where the automatic speech recognition couldn't give that level of accuracy that was needed.

Evolution of Respeaking

The hack was to have a respeaker over many hundreds of hours train and tune the AI to their unique and individual voiceprint.

Recent acquisition of EEG has massively accelerated the transition away from a respeaker dependent model towards one now where over 90% of the content is through a fully automated Lexi solution.

Getting more scalable, accurate, and supporting more languages and applications for live captioning.

When to Use Respeaking

You would put a respeaker on something that is really important content but where you do not have that situation where the AI can deliver those results.

For multiple speakers, mixed quality audio, background noise, singing, multiple languages. That kind of stuff is still going to require humans for that sort of 10%.

iCap Network and Clean Audio

In this section, Tony Abrahams talks about how EEG's encoding products ensure clean audio into the system. They then match the most appropriate speech recognition tool. They dig in behind their customer's firewalls to understand the data and metadata associated with particular content being captured.

Importance of Clean Audio

EEG has 43 years of intellectual property built up in their encoding products which ensures cleanest possible audio into the system.

Match most appropriate speech recognition tool based on good quality audio.

Plugging into the iCap Network, Lexi performs better than humans at the moment because it's quicker.

Understanding Content

Dig in behind customer's firewalls to understand data and metadata associated with particular content being captured.

Have all of the Chicago suburbs and nearby areas automatically featured in that dictionary set so that the AI understands the context that it's providing for.

When to Use Respeaking vs. Lexi

In this section, Tony Abrahams talks about when to use respeaking vs. Lexi.

Importance of Respeaking

While only 10% of content might require respeakers, it tends to be the most important content for customers.

For example, halftime show at Grammys where someone is singing bilingual in English and Spanish and it's really important that lyrics come up.

Advantages of Lexi

Even if you're getting same level of accuracy but doing it with a latency that's two seconds faster, that's a much better viewer experience.

There are some pieces of content now that he would actually recommend Lexi for over a respeaker.

Ai-Media has hundreds of highly skilled respeakers and stenographers, some of whom are bilingual.

Overview of iCap Network and Lexi

In this section, Tony introduces the iCap Network and Lexi, which are key elements of their technology stack for delivering high-quality automatic captioning.

The Tech Platform

There are three key elements to their technology stack: encoding, iCap Network, and automatic captioning (Lexi).

Encoding gets audio into the system and can be configured with a customer to optimize for automatic captioning output.

The iCap Network connects all devices to each other and to the iCap Cloud where multiple automatic speech recognition engines sit.

Audio is run through the engine on the cloud using iCap. Captions are then sent back to the inserter and made available to viewers.

Lexi 3.0

The latest version of Lexi (3.0) automatically places captions, moves them if there's text on screen or over someone's face or play of the ball.

It delivers 30% fewer errors than its predecessor (Lexi 2.0).

AI capabilities have increased significantly in generating accurate speech-to-text results.

Enterprise vs Consumer SaaS

In this section, Tony discusses how their product is fully enterprise-focused rather than direct-to-consumer. He also explains how they're doing what Microsoft Teams and Zoom have done but in a professional broadcast environment.

Enterprise-Focused Product

Their product is fully enterprise-focused rather than direct-to-consumer.

They're doing what Microsoft Teams and Zoom have done but in a professional broadcast environment and broadcast-adjacent environments.

EEG, which they've owned for two years, will be rebranded and relaunched as an integrated product range.

Lexi 3.0 and Measuring Captioning Quality

In this section, Tony and Florian discuss Lexi 3.0, an improved ASR engine that is being used for live captioning. They also talk about the KPIs used to measure captioning quality.

Lexi 3.0

Lexi 3.0 is an improved ASR engine that is being used for live captioning.

It performs better than other available options in the context of live captioning.

With Lexi 3.0, the NER (Number Edition and Recognition) has moved to about 98.7 to 98.8, which means a third less errors compared to previous versions.

The bell curve has shifted significantly with one new product launch, bringing it closer to studio quality audio and access to context.

Measuring Captioning Quality

There can be debates about AI and whether there will be a winner takes all or if there will be winners in certain niches.

Captioning quality can be measured with a simple word error rate or NER (Number Edition and Recognition), which weights the type of errors based on their severity.

An internationally accepted benchmark for NER score is 98 or above.

Our respeakers typically get 99.5 or above on the NER score.

In situations where we have studio quality audio and access to context, measuring captioning quality becomes much simpler.

This section discusses how Lexi 3.0 improves live captioning and how captioning quality is measured. The NER score is used to measure the severity of errors, and Lexi 3.0 has significantly improved this score compared to previous versions.

NER and Translation

In this section, Tony talks about how their product can deliver an NER of greater than 98% and the translation multilingual component of their business.

NER

The product can be put through Lexi to deliver an NER of greater than 98%.

Workflow in 2018 involved getting captioning done in English. Respeakers would paraphrase what was being said into short, sharp sentences optimized for machine translation algorithm.

Simple text (paraphrased captions) sent up to iCap cloud. Interrogated one of the different translation engines. Sent multiple engines in text back through to a different page or app.

Synthetic voice engines used to listen to automatically generated transcription with synthetic voice.

Translation

Converting and translating content could make them like a remote simultaneous interpreting provider in certain scenarios.

Large language models and AI clusters are delivering breakthrough results on training AI clusters for translations between top 150 languages.

Rapid development is happening on fully automated translation products that cut out respeakers from the mix.

Same technology that made ChatGPT successful is applicable in their context. Speech recognition has become more sophisticated, allowing for sentence-by-sentence interpretation.

Improvements in AI and Growth Strategy

In this section, Tony Abrahams discusses the improvements in AI and how it has benefited Ai-Media's business. He also talks about the company's growth strategy through acquisitions and organic growth.

Improvements in AI

The shift to Lexi 3.0 over Lexi 2.0 has resulted in significant improvements in AI.

These underlying improvements have been very beneficial for Ai-Media's business as it helps improve the outcome that they can deliver with their encoding solutions, with the iCap Network, and with Lexi.

Growth Strategy

Ai-Media has done five acquisitions so far, one of which was done ten years ago in the UK while three were done in 2020.

All of these services businesses are similar to Ai-Media but have a footprint in the US which helped them get scale in North America.

EEG was a completely different acquisition as it was a vertical integration play. It provided them with a defensible moat while the world transitions to automatic speech recognition.

EEG's technology could be deployed right outside the US and outside of broadcast. They have added additional standards and functionality to make it work globally.

The EEG product suite has a 43-year history and some really important brand equity, particularly in the US.

Taking that product line and making it work outside the US has meant that they've had to add additional standards, functionality, etc.

Integration of Acquisitions

In this section, Tony Abrahams talks about how they integrated their acquisitions into Ai-Media.

Integration of EEG

EEG was the biggest acquisition as they paid $35 million for its technology.

The easiest integration of any of those companies was with EEG because they already worked really well together and there was no replication of any of these functions because it was a vertical integration play.

With all of the other businesses, everyone had their own way of doing the same thing. And so you had to consolidate, you had to find new ways of doing it.

There hasn't been any change with EEG since the acquisition. They have just doubled the EEG business line in the first 18 months of owning what's been a 43-year-old business.

Ai-Media as a Publicly Listed Company

In this section, Florian talks about how Ai-Media is able to finance acquisitions because it is a publicly listed company.

One reason why Ai-Media is able to do these types of acquisitions and finance them is that it is a publicly listed company.

It is one of the very few LSP's that are actually listed still.

Listing and Future Plans

In this section, the speaker talks about the decision to list their company in 2020 and how it allowed them to raise funding quickly. They also discuss the benefits of being a public company and share their plans for the future.

Listing and Benefits of Being a Public Company

The speaker mentions that they listed their company in 2020 when it was an easier IPO window than it is today.

Being listed allowed them to make a clear bid for EEG and raise funding within 36 hours.

The discipline of being a public company allows them to disclose performance and be transparent with employees.

They have five-year, three-year, and one-year plans broken down into four-month intervals or trimesters.

Future Plans

Their top priority for 2023 is getting into a sustainable commercial model for iCap, which historically hasn't had proper focus or development.

They continue to develop encoding products such as physical encoders (4K to basic SD options), firmware, software, Alta (an IP encoding product), and Falcon (a cloud-based encoder).

They are investing heavily in upgrading the iCap network that will be released in the next few weeks with updated security, simpler pricing models, increased reliability, and uptime as captioning provided by Lexi doubles every year.

Their main focus is on delivering high-quality live captioning to professional customers while being an indispensable partner to them.

Upgrading iCap Network

In this section, the speaker talks about upgrading the iCap network by increasing security measures, improving uptime from three nines to four nines uptime, and simplifying the pricing model.

Upgrading iCap Network

The iCap network needs a major refresh, more security, and four nines uptime.

Historically, EEG did not invest much in this area because they weren't getting any revenue from it.

They are investing heavily now with a customer focus on increased security, reliability, and uptime as captioning provided by Lexi doubles every year.

They are simplifying the pricing structure to an hourly charge to support the network.

Developing Encoding Products

In this section, the speaker talks about developing encoding products such as physical encoders (4K to basic SD options), firmware, software, Alta (an IP encoding product), and Falcon (a cloud-based encoder).

Developing Encoding Products

They continue to develop their encoding products such as physical encoders (4K to basic SD options), firmware, software, Alta (an IP encoding product), and Falcon (a cloud-based encoder).

Their main focus is making sure that those encoders and the firmware/software can deal with every form of video content being delivered in every form of caption insertion.

Alta is an absolute breakthrough product because it can take any form of IP video, encode it and then encode captions back. It's available as a virtual install so customers can install it on their kit quickly.

They plan to provide further product releases in Falcon to make it easier for people to stream while improving its pricing structure.

Investing in Lexi

In this section, the speaker talks about investing heavily in Lexi itself by making sure that there are more applications and value from data.

Investing in Lexi

Their main focus is on delivering high-quality live captioning to professional customers while being an indispensable partner to them.

They are investing heavily in Lexi itself by making sure that there are more applications and value from data.

Live Lexi has been going for a very long time now.

New Products and Market Opportunities

In this section, Tony explains the new products that Verbit is launching and the market opportunities they are focusing on.

New Products

Verbit is launching a new product called Lexi Library which stitches captions to customers' media library. This allows customers to search their library by caption and go to exactly the point where a particular word was said.

Lexi Library is an evolution of Verbit's sub silo product.

Another new product that Verbit is launching is Lexi Live, which provides fast rough transcripts for recorded content.

Market Opportunities

The broadcast industry in North America and Australia already has product-market fit for Verbit's services.

There are opportunities for market development in enterprise sectors such as cities (e.g., Baltimore, Austin, San Francisco), India (which has never done captioning before), and other industries where people have relied on human-curated captioning services.

Verbit's focus will be on scalable technology solutions, live events, and where the AI network can bring value to the customer. They are also open to partnering with other players in the industry to provide services that are adjacent to theirs.

Investment Focus

In this section, Tony talks about how Verbit plans to continue delivering premium quality service with its infrastructure while focusing on investment areas.

Investment Areas

Verbit will continue investing in its infrastructure which it has already invested over $50 million in over a 15-year period.

Verbit is focusing its investment on three areas:

Lexi Library

Lexi Live

Further evolutions of the Lexi translate options as they continue to get more confidence with more language pairs.

Market Demand

In this section, Florian asks Tony about where he sees the biggest demand in the market over the next few years.

Market Demand Areas

Verbit sees two sales opportunities:

Acceleration mode where there's already product-market fit (e.g., broadcast industry in North America and Australia).

Market development to be done in enterprise sectors such as cities, India, and other industries where people have relied on human-curated captioning services.

There are new markets that Verbit is developing such as cities which are not broadcasters but stream content. They are also looking at partnering with other players in the industry to provide services that are adjacent to theirs.

Partnerships and Localization Services

In this section, Tony discusses the partnerships that EEG has with various companies, including event partners, technology partners like Grass Valley, and third-party captioning providers like Dynamic Captioning. He also talks about how EEG works with other vendors for localization services.

EEG's Partnerships

EEG has a mix of partners, including event partners, technology partners like Grass Valley, and third-party captioning providers like Dynamic Captioning.

EEG also works with companies like Rev to provide wraparound services where quality needs to be improved overnight.

As long as they can deliver the level of quality and service that customers require, EEG is happy to partner with anyone.

Localization Services

EEG had an in-house localization service that was shut down after the acquisition of EEG.

If the acquisition hadn't happened, they would have gone further down that language localization route.

In 2020, 45% of their business was recorded media. This year it'll be less than 15%, so they're doubling down on Live.

Conclusion

Florian thanks Tony for his time and concludes the interview.

Florian thanks Tony for taking the time to speak with him.