Several widely used global artificial intelligence (AI) systems struggle with Indian languages, accents and dialects, even as voice-based interfaces are increasingly being used in public services and consumer applications, according to an AI benchmark report. Called Voice of India, the sovereign benchmark is developed by Josh Talks in collaboration with AI4Bharat at IIT Madras and evaluates automatic speech recognition (ASR) systems across 15 Indian languages using speech from more than 35,000 speakers.
The benchmark test has reportedly indicated a wide gap in performance between India-focused models and several global systems, particularly on regional languages and dialects, and point to continuing limitations in how current speech models handle real-world Indian speech.
What is Voice of India
Voice of India is a speech recognition benchmark designed to test how well AI systems transcribe speech as it is actually spoken in India. The dataset covers 15 Indian languages and includes audio from over 35,000 speakers, with around 2,000 speakers per language. Unlike many existing benchmarks that rely on clean, read-out speech, Voice of India uses conversational and spontaneous speech that includes background noise, code-mixed language and regional variation.
The benchmark evaluates models across both major languages such as Hindi and Bengali and regional ones such as Odia and Assamese. It also includes dialect-level testing, including variants such as Bhojpuri and Chhattisgarhi, to measure how systems perform beyond standardised forms of a language.
What does the benchmark show
According to the results shared, Sarvam Audio, the speech recognition model developed by Indian startup Sarvam AI, ranked first or second across most of the languages and dialects tested. Google’s Gemini models performed closer to the Indian systems, while other global models showed significantly higher error rates in several languages. In some cases, the gap between Sarvam’s model and OpenAI’s GPT-4o transcription systems exceeded 50 percentage points in overall average accuracy.
The benchmark also highlights differences across language families. All tested systems perform better on Indo-Aryan languages such as Hindi and Bengali, where word error rates are lower, than on Dravidian languages such as Tamil, Telugu, Malayalam and Kannada, where error rates rise sharply. In dialect tests, even the best-performing models saw error rates climb to 20–30 per cent for languages such as Bhojpuri, compared to under 10 per cent for standard Hindi.
Why such a benchmark is required
The release of Voice of India comes at a time when voice is increasingly being used as an interface for services ranging from customer support and banking to healthcare and government programmes. In such settings, transcription errors are not just a technical issue. A word error rate of 20–30 per cent can mean that names, locations, numbers or instructions are recorded incorrectly, with direct implications for service delivery.
The benchmark’s findings suggest that many global speech models, which are largely trained on Western or standardised datasets, still struggle with Indian accents, code-mixed speech and regional variation. For example, the results show that several systems either perform poorly or do not support a number of Indian languages at all, limiting their usefulness in large parts of the country.
“This is one of the most rigorous large-scale evaluations of speech recognition for Indian languages, containing district level cohorts with balanced representation across gender and age to truly reflect India’s diversity,” said Mitesh Khapra of AI4Bharat at IIT Madras. “Further, recognising that conventional word error rate can unfairly penalize code mixed and multilingual speech, we manually curated multiple valid spelling variants for transcripts, ensuring models are judged for linguistic correctness rather than orthographic variation. This human intensive effort sets a new benchmark for fair and representative ASR evaluation in India.”
What is AI4Bharat
AI4Bharat is a research initiative based at IIT Madras that focuses on building open and inclusive AI systems for Indian languages. The group has been involved in creating datasets, benchmarks and models for both text and speech, aimed at improving how AI systems handle India’s linguistic diversity.