Kalika Bali: The giant leaps in language technology -- and who's left behind | TED

Kalika Bali: The giant leaps in language technology -- and who's left behind | TED

Introduction to Language Technology

Overview of the Speaker

  • Kalika Bali introduces herself as a linguist and technologist with over two decades of experience in academia, startups, and multinationals.
  • She works at Microsoft Research Labs India, focusing on language technology and speech technology.

Accessibility in Technology

  • Bali expresses concern about making technology accessible across different languages, emphasizing the importance of natural language processing (NLP).
  • NLP is defined as a branch of computer science that enables machines to process, understand, and generate human language.

Understanding Natural Language Processing

Functionality of NLP

  • Examples are provided where NLP is used, such as booking tickets through bots or interacting with voice assistants.

Mechanism Behind NLP

  • The core function of NLP relies on vast amounts of data processed by algorithms to identify patterns in human language.
  • Deep neural networks are highlighted as advanced techniques driving current NLP advancements.

Data Requirements for Effective NLP

Importance of Data

  • Successful speech systems require extensive datasets; for instance, Microsoft's 2017 system was trained on 200 million transcribed words.

Translation Systems

  • An English-Chinese translation system developed in 2018 achieved human-level performance using 18 million bilingual sentence pairs.

The Digital Divide in Language Resources

Resource Distribution Across Languages

  • Monojit Choudhury's research indicates a power-law distribution where only four languages (Arabic, Chinese, English, Spanish) have abundant resources.

Impact on Lesser-Known Languages

  • Approximately 90% of the world's languages lack sufficient technological resources leading to an expanding digital divide among communities.

Project Ellora: Bridging the Gap

Goals and Initiatives

  • Project Ellora aims to create more data through innovative methods and develop technologies for resource-poor languages.

Case Study: Gondi Language

  • Gondi is identified as a vulnerable South-Central Dravidian language spoken by three million people in India but lacks tech support.

Community Involvement

  • Collaboration with NGOs like CGNet Swara led to the creation of children's books in Gondi for local access to stories.

Technological Innovations

Innovative Language Technology for Underserved Communities

Bridging Language Gaps with Technology

  • The app developed utilizes a Hindi text-to-speech system to read news and articles in the Gondi language, enhancing information access for users. This initiative allows community members to engage with content in their native language.
  • Community-driven translation efforts are underway, enabling users to translate text from Hindi to Gondi, which will generate parallel data essential for building machine translation systems tailored for the Gondi language. This opens new avenues of communication for the Gond community.

Empowering Livelihoods through Digital Tools

  • Researchers Vivek Seshadri and Manu Chopra created a platform called Karya aimed at providing digital microtasks to underserved communities, facilitating dignified labor opportunities for rural and urban poor populations who lack access to digital knowledge.
  • The Karya platform serves as a bridge into the digital world, allowing individuals from these communities to perform tasks that can earn them money, thus improving their economic situation. The potential application of this platform extends beyond employment; it can also be utilized for data collection purposes.

Lessons from Data Collection in Rural Areas

  • A successful data collection trip was conducted in Amale village (Maharashtra), where despite being remote and lacking basic amenities like electricity or mobile signal, residents contributed valuable Marathi language data while expressing pride in their linguistic heritage.
  • The project highlighted three key lessons:
  • Pride in Language: Community members were enthusiastic about advancing their own language.
  • Community Effort: Data collection became a collective activity that fostered unity among villagers.
  • Storytelling Importance: Villagers engaged in storytelling sessions using Karya, showcasing a deep cultural need for content creation and sharing within the community.

Understanding User Needs Beyond Technology

  • The speaker emphasizes that technology should prioritize user needs rather than just technical advancements; successful technology must keep people at its core while recognizing that social and cultural factors play significant roles alongside technological solutions.

Challenges Encountered During Implementation

  • In an agricultural video search project named VideoKheti aimed at Hindi-speaking farmers, initial model training yielded poor results due to unexpected background noise from night insects during recordings, highlighting the importance of environmental factors on data quality.
  • Additionally, discrepancies arose between formal agricultural terminology used by extension centers and local vernacular terms familiar to women farmers; this gap underscored the necessity of understanding user context when developing technology solutions tailored for specific communities.

Strategic Approach Towards Language Technology Development

  • To ensure effective resource allocation and positive social impact within language tech initiatives, a modified 4-D design thinking methodology is employed:
  • Discover: Identify problems that language technology can address within specific communities.
  • Design: Tailor solutions based on user needs and linguistic diversity.

Language Revitalization and Perseverance

The Importance of Adaptation in Language Development

  • Emphasizes the need for rapid development and frequent deployment in language projects, suggesting an iterative process that allows for quick failures which can lead to eventual success.
  • Highlights the misconception that language tools are only suitable for English, advocating for adaptation to other languages like Marathi or Gondi.

A Story of Resilience: Patricia O'Connor and Ysola Best

  • Shares the inspiring story of two Australian Aboriginal women who faced discouragement while trying to revive their native language, Yugambeh.
  • Despite being told their language was "dead" and not worth pursuing, they persevered by engaging with their community to recover oral traditions and literature.
Channel: TED
Video description

Visit http://TED.com to get our entire library of TED Talks, transcripts, translations, personalized talk recommendations and more. Thousands of languages thrive across the globe, yet modern speech technology -- with all of its benefits -- supports just over a hundred. Computational linguist Kalika Bali dreams of a day when technology acts as a bridge instead of a barrier, working passionately to build new and inclusive systems for the millions who speak low-resource languages. In this perspective-shifting talk, she outlines what happens when a language is omitted from the digital landscape -- and what can be gained when communities keep pace with the future. The TED Talks channel features the best talks and performances from the TED Conference, where the world's leading thinkers and doers give the talk of their lives in 18 minutes (or less). Look for talks on Technology, Entertainment and Design -- plus science, business, global issues, the arts and more. You're welcome to link to or embed these videos, forward them to others and share these ideas with people you know. Become a TED Member: http://ted.com/membership Follow TED on Twitter: http://twitter.com/TEDTalks Like TED on Facebook: http://facebook.com/TED Subscribe to our channel: http://youtube.com/TED TED's videos may be used for non-commercial purposes under a Creative Commons License, Attribution–Non Commercial–No Derivatives (or the CC BY – NC – ND 4.0 International) and in accordance with our TED Talks Usage Policy (https://www.ted.com/about/our-organization/our-policies-terms/ted-talks-usage-policy). For more information on using TED for commercial purposes (e.g. employee learning, in a film or online course), please submit a Media Request at https://media-requests.ted.com