Kalika Bali: The giant leaps in language technology -- and who's left behind | TED
Introduction to Language Technology
Overview of the Speaker
- Kalika Bali introduces herself as a linguist and technologist with over two decades of experience in academia, startups, and multinationals.
- She works at Microsoft Research Labs India, focusing on language technology and speech technology.
Accessibility in Technology
- Bali expresses concern about making technology accessible across different languages, emphasizing the importance of natural language processing (NLP).
- NLP is defined as a branch of computer science that enables machines to process, understand, and generate human language.
Understanding Natural Language Processing
Functionality of NLP
- Examples are provided where NLP is used, such as booking tickets through bots or interacting with voice assistants.
Mechanism Behind NLP
- The core function of NLP relies on vast amounts of data processed by algorithms to identify patterns in human language.
- Deep neural networks are highlighted as advanced techniques driving current NLP advancements.
Data Requirements for Effective NLP
Importance of Data
- Successful speech systems require extensive datasets; for instance, Microsoft's 2017 system was trained on 200 million transcribed words.
Translation Systems
- An English-Chinese translation system developed in 2018 achieved human-level performance using 18 million bilingual sentence pairs.
The Digital Divide in Language Resources
Resource Distribution Across Languages
- Monojit Choudhury's research indicates a power-law distribution where only four languages (Arabic, Chinese, English, Spanish) have abundant resources.
Impact on Lesser-Known Languages
- Approximately 90% of the world's languages lack sufficient technological resources leading to an expanding digital divide among communities.
Project Ellora: Bridging the Gap
Goals and Initiatives
- Project Ellora aims to create more data through innovative methods and develop technologies for resource-poor languages.
Case Study: Gondi Language
- Gondi is identified as a vulnerable South-Central Dravidian language spoken by three million people in India but lacks tech support.
Community Involvement
- Collaboration with NGOs like CGNet Swara led to the creation of children's books in Gondi for local access to stories.
Technological Innovations
Innovative Language Technology for Underserved Communities
Bridging Language Gaps with Technology
- The app developed utilizes a Hindi text-to-speech system to read news and articles in the Gondi language, enhancing information access for users. This initiative allows community members to engage with content in their native language.
- Community-driven translation efforts are underway, enabling users to translate text from Hindi to Gondi, which will generate parallel data essential for building machine translation systems tailored for the Gondi language. This opens new avenues of communication for the Gond community.
Empowering Livelihoods through Digital Tools
- Researchers Vivek Seshadri and Manu Chopra created a platform called Karya aimed at providing digital microtasks to underserved communities, facilitating dignified labor opportunities for rural and urban poor populations who lack access to digital knowledge.
- The Karya platform serves as a bridge into the digital world, allowing individuals from these communities to perform tasks that can earn them money, thus improving their economic situation. The potential application of this platform extends beyond employment; it can also be utilized for data collection purposes.
Lessons from Data Collection in Rural Areas
- A successful data collection trip was conducted in Amale village (Maharashtra), where despite being remote and lacking basic amenities like electricity or mobile signal, residents contributed valuable Marathi language data while expressing pride in their linguistic heritage.
- The project highlighted three key lessons:
- Pride in Language: Community members were enthusiastic about advancing their own language.
- Community Effort: Data collection became a collective activity that fostered unity among villagers.
- Storytelling Importance: Villagers engaged in storytelling sessions using Karya, showcasing a deep cultural need for content creation and sharing within the community.
Understanding User Needs Beyond Technology
- The speaker emphasizes that technology should prioritize user needs rather than just technical advancements; successful technology must keep people at its core while recognizing that social and cultural factors play significant roles alongside technological solutions.
Challenges Encountered During Implementation
- In an agricultural video search project named VideoKheti aimed at Hindi-speaking farmers, initial model training yielded poor results due to unexpected background noise from night insects during recordings, highlighting the importance of environmental factors on data quality.
- Additionally, discrepancies arose between formal agricultural terminology used by extension centers and local vernacular terms familiar to women farmers; this gap underscored the necessity of understanding user context when developing technology solutions tailored for specific communities.
Strategic Approach Towards Language Technology Development
- To ensure effective resource allocation and positive social impact within language tech initiatives, a modified 4-D design thinking methodology is employed:
- Discover: Identify problems that language technology can address within specific communities.
- Design: Tailor solutions based on user needs and linguistic diversity.
Language Revitalization and Perseverance
The Importance of Adaptation in Language Development
- Emphasizes the need for rapid development and frequent deployment in language projects, suggesting an iterative process that allows for quick failures which can lead to eventual success.
- Highlights the misconception that language tools are only suitable for English, advocating for adaptation to other languages like Marathi or Gondi.
A Story of Resilience: Patricia O'Connor and Ysola Best
- Shares the inspiring story of two Australian Aboriginal women who faced discouragement while trying to revive their native language, Yugambeh.
- Despite being told their language was "dead" and not worth pursuing, they persevered by engaging with their community to recover oral traditions and literature.