# 147 LXT’s Phil Hall on the 2023 AI Boom, Covering 750 Languages, and ChatGPT
Introduction to LXT and Phil Hall
Overview of the Podcast
- The podcast introduces Phil Hall, Chief Growth Officer at LXT, a leader in AI training data based in Ontario, Canada.
- Phil is currently recording from Sydney, Australia, where he has lived for many years.
Background of Phil Hall
- Phil shares his previous experience with Appen, an Australian company where he was employee number three and worked for 17 years.
- He highlights that despite being based in Australia, LXT was open to hiring him due to his expertise.
LXT Company History and Growth
Founding and Development
- LXT was founded in 2010 to meet the demand for high-quality Arabic data from a major tech company.
- The founder, Mohammad Omar, expanded services from Arabic data collection to include annotation across 250 languages over ten years.
Current Operations
- LXT has around 500 employees: approximately 70 professional staff and numerous annotators working in secure facilities.
- The company aims to transition from a service-oriented business model towards becoming more technology-focused.
Phil's Role as Chief Growth Officer
Responsibilities and Goals
- As Chief Growth Officer, Phil oversees sales, marketing, and corporate development efforts aimed at expanding the customer base.
- Under his leadership, LXT now operates across five main use cases: AR/VR, computer vision, conversational AI, search relevance speech, and NLP.
Current Strengths and Market Trends
Areas of Expertise
- LXT maintains strong capabilities in the speech domain while rapidly expanding into image and video-related services.
Competitive Landscape
- The image/video market is described as crowded with low barriers to entry; thus it is highly competitive compared to language-based services which are less saturated.
Challenges in Language Services
Complexity of Language Annotation
- Language annotation requires building a global service network which takes time compared to simpler image/video setups that can be established quickly.
Volume of Languages Managed
Growth in Language Diversity at LXT
Expansion of Language Offerings
- The speaker notes that LXT expanded from working with 250 languages to 780 languages by 2022, highlighting the challenges this growth poses for backend systems and payment processes.
- The speaker reflects on their background in linguistics and admits a lack of familiarity with many lesser-known languages, particularly those from Africa and India.
Language Presence on the Internet
- Emphasizes that while some languages are studied for preservation, the focus at LXT is on languages that have an online presence, indicating a practical approach to language work.
Annotation Proficiency Requirements
- Generally, native speakers are preferred for labeling and annotation tasks; however, non-native speakers can be utilized in certain cases to reduce costs.
- The cost-effectiveness of using native speakers varies by dialect; for example, Egyptian Arabic may be less expensive than Saudi or UAE Arabic.
Understanding Search Relevance Services
Definition and Importance
- Search relevance encompasses user intent analysis and ranking search results based on what users actually mean versus what they type.
Ongoing Need for Data Freshness
- The need for fresh data is crucial as algorithms require continuous updates; old data becomes ineffective over time.
Adapting to Emerging Vocabulary
Impact of Current Events
- New vocabulary emerges rapidly due to current events (e.g., COVID), necessitating ongoing retraining in various business sectors like entertainment.
Translation Work Context
Role in Machine Translation
- While LXT contributes to machine translation applications, it does not specialize in translation or localization due to the complexity and specialization required in that field.
Client Needs Versus Expertise
- Clients often seek comprehensive packages involving speech, text, and translation services from one vendor rather than multiple sources.
Survey Insights: Path to AI Majority
Executive Self-Assessments
AI Adoption Insights and Industry Trends
Rejection Rates and Self-Perception in AI Knowledge
- A high rejection rate was noted, with 800 applicants failing screening tests while only 200 were accepted, highlighting a gap between self-perception and actual knowledge of AI.
- Less than 40% of participants were at higher levels of maturity in AI understanding; a distinction was made between those at advanced stages versus those still experimenting or aspiring.
Contrasting Perspectives on Data Utilization
- The aspirational group believed they could achieve significant results using unannotated data for unsupervised learning, while the mature group emphasized the necessity of annotated data for supervised or semi-supervised learning.
- The contrast indicates that those inexperienced in AI often underestimate the costs associated with effective data utilization.
Financial Sector as a Leader in AI Adoption
- The financial industry is seen as a trailblazer in AI adoption, driven by strong motivations such as fraud detection needs.
- Fraud detection is crucial for financial institutions, providing them with an incentive to invest heavily in AI technologies.
Structured vs. Unstructured Data Challenges
- Financial data is typically structured and comes with metadata, making it easier to apply machine learning compared to unstructured data like speech.
- This inherent structure allows financial firms to engage with machine learning more readily than sectors reliant on unstructured data.
Emergence of Chief AI Officer Roles
- There’s an emerging trend towards roles like Chief AI Officer within organizations, although currently prevalent roles include Chief Information Officers and Chief Data Officers driving AI initiatives.
- The rapid evolution of these roles suggests that organizational structures around AI are likely to change quickly due to increasing interest and investment.
Impact of ChatGPT on Public Awareness of AI
- ChatGPT has generated significant buzz across social media platforms, indicating a shift from niche interest in AI to broader public awareness.
AI Development and Competitive Landscape
Google vs. OpenAI: Caution in AI Deployment
- Recent discussions suggest that Google has been more cautious in deploying its AI models, such as PaLM, compared to OpenAI's approach with ChatGPT.
- Concerns about potential backlash from users if the AI does not meet expectations may have influenced Google's hesitance.
Ethical Considerations in AI Responses
- A significant difference between GPT-3 and ChatGPT is how they handle hate speech and inappropriate content; ChatGPT actively shuts down such inquiries.
- The importance of human annotation in training these systems is often overlooked, highlighting the need for ethical considerations in data handling.
Competitors in the AI Space
- Discussion on emerging competitors like Scale.AI, SurgeHQ, and Snorkel raises questions about their impact on the industry landscape.
- There is often a disparity between marketing claims of these companies and their actual technological capabilities.
Technology vs. Services: The Case of Scale.AI
- While Scale.AI appears strong technologically, there are indications that it may still operate with a services-oriented background despite its public image.
Framework for Understanding the AI Ecosystem
- Sridhar Ramaswamy categorizes the current AI ecosystem into five segments: foundation model players (like OpenAI), frontend startups (like Jasper), tooling companies (like Scale.ai), and major cloud providers (Google, Azure, AWS).
- Data labeling companies are likened to "shovel providers" during a gold rush—essential but often underappreciated contributors to the industry.
M&A Strategies in Growing Companies
- LXT considers mergers and acquisitions as part of its growth strategy but acknowledges the extensive effort required to find suitable opportunities.
Building vs. Buying Technology
Internal Development and Acquisition Strategy
- The organization is currently focusing on building technology internally, but acknowledges its small size limits resources.
- There is an interest in acquiring companies with emerging technologies that could leverage the organization's extensive data resources.
- Successful technology requires not just funding but also access to client volumes, which can take years to develop.
Ethical Considerations in Data Collection
- The organization maintains strict ethical standards in data collection, ensuring proper permissions are obtained.
- They engage in both data collection and annotation, often using live data from clients' end users.
Shift Towards Secure Data Annotation
- A notable trend has emerged where Big Tech companies are investing more in secure facilities for data annotation rather than relying solely on crowdsourcing.
- The organization has expanded its secure facilities from none to five locations within a year and a half, creating a competitive advantage.
Synthetic Data and Future Trends
Observations on Synthetic Data Usage
- There is growing interest in synthetic data generation for training AI models, particularly noted in image and video sectors.
- While the organization recognizes the importance of synthetic data, they currently lack capabilities in this area but are exploring development options.
Predictions for AI Industry Growth
- The speaker believes the AI industry is still early in its growth phase and not at peak hype yet.
- Emphasizes the ongoing need for data infrastructure as demand will persist even beyond their lifetimes.
Challenges with AI Deployment
- Major tech companies face difficulties managing AI developments due to inherent risks associated with rapid advancements.