CIO Tech Outlook Team | Monday, 04 December 2023, 09:15 IST
For several weeks this year, villagers in south-eastern Karnataka read several sentences to him in an app in their native language, Kannada, as part of a project to create the first artificial intelligence chatbot for tuberculosis treatment. There are more than 40 million native speakers of Kannada in India, making it one of India's 22 official languages "‹"‹and one of 121 languages "‹"‹spoken by more than 10,000 people in the world, making it even the largest in the country.
However, only some of these languages "‹"‹are subject to natural language processing (NLP), a branch of artificial intelligence that allows computers to understand text and spoken words. Hundreds of millions of Indians have been excluded from meaningful information and many economic opportunities.
"For AI tools to work for everyone, they need to also cater to people who don't speak English or French or Spanish," said Kalika Bali, principal researcher at Microsoft Research India.
"But if we had to collect as much data in Indian languages as went into a large language model like GPT, we'd be waiting another 10 years. So what we can do is create layers on top of generative AI models such as ChatGPT or Llama," Bali told the Thomson Reuters Foundation.
Residents of Karnataka are among the thousands of people who speak Indian languages "‹"‹who produce news for technology company Karya, which creates data for companies such as Microsoft and Google to use in AI models for education, health and other services.
The Indian government, seeking to make digital services more accessible, is also building language lists through Bhashini, an AI-powered language translation system that generates open-source local language datasets to build AI tools.
The platform includes a crowd sourcing program that allows people to post sentences in different languages, validate audio or text written by others, and translate text and subtitles.
We use cookies to ensure you get the best experience on our website. Read more...