India's Sarvam AI reportedly beats ChatGPT, Gemini in key benchmark tests

India's Sarvam AI reportedly beats ChatGPT, Gemini in key benchmark tests



Indian AI startup Sarvam AI has reported strong performance on a set of benchmarks focused on document understanding and Indian languages, putting its models ahead of several widely used systems on those specific tests. The results come from evaluations covering optical character recognition (OCR), document layout understanding, and Indic language processing, areas where global models often face accuracy issues with non-Latin scripts and complex page structures.

 

The benchmarks include document OCR and layout understanding tests such as olmOCR-Bench, along with internal and public evaluations covering multi-script Indian documents, tables, and mixed-layout pages. In these tests, Sarvam’s document model posted higher accuracy scores than several general-purpose vision and language models on the same tasks. 


The benchmarks were released ahead of the India AI Impact Summit, which begins on February 16 in New Delhi, where the startup is expected to showcase the working and capabilities of its sovereign AI models in the expo zone.

 


Sarvam AI


Sarvam AI is a Bengaluru-based startup founded in 2023 that builds language and multimodal AI systems focused on Indian use cases. The company works on models for document processing, speech, and language understanding, with training data drawn from Indian languages, scripts, and real-world material such as documents, textbooks, newspapers, and scanned records.


Instead of building a single general-purpose chatbot, Sarvam has focused on task-specific systems for areas like OCR, document parsing, speech synthesis, speech recognition, and translation, where performance depends heavily on how well models handle local languages and formats.


Sarvam Vision: Document-focused AI model


The main model behind the document benchmark results is Sarvam Vision, a vision-language model designed for document understanding rather than basic text extraction. Unlike traditional OCR systems that output plain text, Sarvam Vision is built to interpret layout, reading order, tables, charts, and structured elements in scanned or photographed documents.

 


Sarvam said the model is a three-billion-parameter system trained on a mix of real and synthetic documents, including textbooks, financial records, government documents, magazines, newspapers, and historical material, across multiple Indian languages and English. The system combines a core vision-language model with separate components for layout parsing and reading-order detection, which are used to reconstruct documents in a structured form.

 


On olmOCR-Bench, a practical benchmark designed to test real-world OCR and document understanding performance, Sarvam Vision reported an accuracy score of 84.3 per cent. This was higher than the scores posted by several general-purpose models, including Google Gemini 3 Pro (80.20) and OpenAI’s GPT 5.2 (69.80), evaluated on the same benchmark, particularly on pages with complex layouts and non-Latin scripts.


Sarvam Vision on olmOCR benchmark (Source: Sarvam)


The company has also published results on Indic-language document tests covering multiple scripts, where it reports higher word-level accuracy across a wide range of Indian languages compared to other OCR and vision-language systems.


What the benchmarks indicate


The recent results place Sarvam AI’s models ahead of several widely used systems on document OCR and layout understanding tasks, particularly for Indian scripts and complex page structures. The tests focus on how accurately a model can read text, follow reading order, and interpret structured content such as tables and multi-column layouts, rather than on general conversational ability.

 


This distinction matters because general-purpose AI models are trained to handle a wide range of tasks, while Sarvam’s systems are trained more narrowly on documents and Indian languages. That targeted training shows up in benchmarks that measure OCR accuracy, layout parsing, and script-level recognition.


Sarvam AI’s wider model portfolio


Alongside Sarvam Vision, the company has built a wider set of language models:

 


Bulbul: Text-to-speech model

 


Bulbul is Sarvam’s speech synthesis system that converts text into spoken output. Bulbul V3 is designed to handle multiple Indian languages, accents, and code-mixed speech patterns, and is aimed at use cases such as voice interfaces, assistants, and accessibility tools.

 


Saarika: Speech-to-text model

 


A speech recognition model that converts spoken Indian language audio into text. Saarika supports transcription in around 11 Indian languages and is the base option for speech-to-text tasks.

 


Saaras: Speech-to-text translation model

 


Saaras combines speech recognition with direct translation, transcribing spoken input and outputting translated text, such as Indian language speech into English, in a single step.

 


Mayura: Text translation model

 


Mayura handles translation between languages, trained on conversational and real-world data across Indian languages. It is designed for more colloquial and contextual translation needs.

 


Sarvam-M: Multilingual reasoning language model

 


Sarvam-M is the company’s reasoning and conversational language model. Built as a multilingual text model with hybrid reasoning capabilities, it is tuned for better performance on Indian language benchmarks as well as tasks involving logic, maths and extended context.

 


On the application side, Sarvam runs Samvaad, a voice-based conversational system built on top of its speech and language models. Samvaad is designed to handle spoken interactions in multiple Indian languages and is aimed at use cases such as customer support, information access, and voice-driven services, particularly in settings where text-first interfaces are less practical.

 



Source link

Snapchat brings Arrival Notifications option to Snap Map: How to set up

Snapchat brings Arrival Notifications option to Snap Map: How to set up


Arrival Notifications (Image: Snapchat)


Snapchat has introduced a new safety option called Arrival Notifications, expanding on its earlier “Home Safe” feature. According to Snapchat’s blog, the update allows users to automatically notify friends or family when they reach destinations other than home, such as a class, meeting, or travel stop. Snapchat said that the feature is designed to make location sharing more flexible and reduce the need to manually send messages.


Snapchat’s ‘Arrival Notifications’: How it works

Snapchat first rolled out the Home Safe feature to let users alert someone when they arrived home safely. Snapchat is now extending the idea to cover more everyday situations. The company said that users can now set alerts for specific places and choose whether they want them to trigger once or repeat regularly.

 
 

In a blog post, Snapchat said the feature can be useful for simple, routine moments. For example, users can automatically let someone know when they return to their hotel while travelling, arrive at a weekly class or reach a regular meeting spot. According to Snapchat, the goal is to make these updates happen automatically, without users having to remember to send a message each time. 

 


Snapchat has highlighted that privacy controls remain central to how Arrival Notifications work. The company mentioned that alerts can only be sent to people the user has chosen to share their location with. Location sharing on Snap Map is switched off by default, and a user’s location is only visible if they choose to turn it on. One-time alerts automatically expire after they are sent or after 24 hours if the destination is not reached.


How to set up Arrival Notifications


  • Share your location with a friend

  • Open your friendship profile and scroll to “Arrival Notifications.”

  • Select a location on the map and give it a personal name (for example, set the location of your ‘run club’ meet or the location for ‘piano lessons.’)

  • Choose whether the alert should be one-time or recurring

First Published: Feb 10 2026 | 4:50 PM IST



Source link

Why multilingual and multimodal AI is central to India's AI 'impact' agenda

Why multilingual and multimodal AI is central to India's AI 'impact' agenda


Multilingual and multimodal artificial intelligence is set to be one of the core agendas for India at the upcoming AI Impact Summit, which will kick off from February 16 in New Delhi. Over the past years, multiple government-backed projects and platforms have been rolled out around building AI systems that can work across Indian languages and across different formats such as text, speech and documents.

 


Among these projects are the Adi Vaani platform under the Ministry of Tribal Affairs, the BharatGen programme backed by the Department of Science and Technology and the Ministry of Electronics and Information Technology (MeitY), and the BHASHINI language platform under Digital India. Together, they point to a policy push that treats language and voice as core parts of India’s AI infrastructure rather than as add-ons.

 


What “multilingual” and “multimodal” AI means


“Multilingual” AI refers to systems that can understand and generate content in more than one language, including Indian languages that are often poorly represented in global datasets.

 


“Multimodal” AI refers to models that can work across different types of input and output, such as text, speech and images or documents, instead of being limited to only text-based interactions.

 


For public-facing systems, this matters because many government services and information flows rely on a mix of spoken queries, scanned documents and text forms. A multimodal system can, in principle, take a spoken question in an Indian language, read a document, and return an answer in speech or text. Several of the projects being backed by the government are designed around this idea, rather than around English-first, text-only models.


Why multilingual and multimodal AI matter for India


For India, the case for multilingual and multimodal AI is largely practical rather than theoretical. Government services, courts, welfare systems and local administrations operate across dozens of languages, and much of the information citizens interact with is not in a single, standardised format. Applications range from scanned forms and notices to spoken queries at service centres and helplines, making text-only and English-first systems a poor fit for large parts of the population.

 


India has hundreds of languages and dialects in active use, and a significant share of citizens rely primarily on regional or local languages for day-to-day interactions with the state. This creates a gap in access when digital systems are designed mainly around English or a small set of major languages. Multilingual AI systems are meant to reduce that gap by allowing the same service or interface to work across different languages without requiring separate, manual translations for each one.

 


The “multimodal” aspect addresses a different constraint. In many government workflows, information is not limited to typed text. It includes scanned documents, images, and spoken inputs. A system that can only process text leaves large parts of this information outside the digital workflow. Multimodal models are intended to handle this mix by combining text, speech and document or image understanding in a single pipeline.

 


This is also why many of the current government-backed projects are being framed around public services rather than consumer applications.

 


Platforms such as BHASHINI are being positioned for use in citizen-facing portals and administrative processes, while programmes like BharatGen are being funded to build a broader stack of text, speech and document-vision models for Indian languages.


The underlying policy logic is that without language and modality coverage, large sections of the population remain effectively excluded from digital systems, even if connectivity and devices are available.


India’s sovereign AI push: Platforms, programmes and startups


Much of India’s current work on multilingual and multimodal AI is being deployed through publicly funded platforms and research programmes, alongside a growing set of domestic startups working on language and vision models for Indian use cases.

 


One example is Adi Vaani, a translation platform for tribal languages launched by the Ministry of Tribal Affairs in beta last year. Developed by a consortium led by IIT Delhi with BITS Pilani, IIIT Hyderabad, IIIT Naya Raipur and several State Tribal Research Institutes, the platform currently supports languages such as Santali, Bhili, Mundari and Gondi, with more under development. According to the ministry, the system is meant to handle both text and speech translation and is being positioned for use in areas such as education, governance communication and documentation of oral traditions.

 


At a broader level, the government-backed BharatGen programme is aimed at building a full-stack, multilingual and multimodal AI system covering text, speech and document understanding. The project is being led by IIT Bombay with a consortium of other institutions and is supported through the National Mission on Interdisciplinary Cyber-Physical Systems, with Rs 235 crore routed via the Technology Innovation Hub at IIT Bombay. In addition, the programme has received further funding of Rs 1,058 crore under the IndiaAI Mission, taking total government support to over Rs 1,200 crore. BharatGen has already released multiple models, including a text model, speech recognition and text-to-speech systems, and a document-vision model designed to work with Indian-language content and formats.


Alongside this, the Ministry of Electronics and Information Technology is running BHASHINI as a language AI platform for public services. BHASHINI currently supports more than 36 languages in text and over 22 in voice, with hundreds of language models deployed across government websites and applications. The focus here has been on translation, speech recognition and text-to-speech tools that can be integrated into citizen-facing systems rather than standalone consumer products.

 


Outside the government system, several Indian startups are also working on what they describe as “sovereign” or India-focused AI models.

 


Bengaluru-based Sarvam AI, for instance, has published results on optical character recognition and speech models for Indian languages.

 


Krutrim, backed by Bhavish Aggarwal, is building a large multilingual language model focused on Indian contexts, while other companies and research groups, including initiatives such as AI4Bharat at IIT Madras, are working on open and commercial language models for Indian languages.


From policy to deployment: Where these systems are being used


So far, most of these multilingual and multimodal AI systems are being positioned first for use inside government workflows and public service delivery rather than as mass consumer products.

 


Platforms such as BHASHINI are being integrated into government portals and service interfaces, where translation, speech-to-text and text-to-speech tools can be used to make forms, advisories and help desks accessible in multiple languages. The Digital India BHASHINI Division has said the platform is already linked to hundreds of websites and live use cases across departments.

 


Similarly, BharatGen’s early demonstrations have focused on applications such as voice-based advisory systems for farmers, document question-and-answer tools for government records, and image-to-text or image-to-description systems for small businesses. These are designed to work with Indian languages and with inputs such as scanned documents or images, which are common in government and small-business workflows.


Adi Vaani, meanwhile, is being positioned more narrowly around tribal languages, with the stated aim of enabling translation, documentation and access to government information in languages that are often missing from mainstream digital platforms. At this stage, it remains in beta, with limited language coverage, but it reflects a similar approach of starting with public-sector and community-facing use cases.


The constraints: Data, quality and scale


Despite the scale of funding and institutional backing, these projects face practical challenges. One of the main issues is data. High-quality, labelled data in many Indian and tribal languages remains limited, especially for speech and for specialised domains such as legal or administrative documents. This affects both accuracy and reliability, particularly outside a small set of better-resourced languages.

 


Another constraint is variation. Indian languages differ widely in script, grammar, pronunciation and regional usage, which makes building a single system that works consistently across regions difficult. Even within the same language, dialect and accent differences can affect performance in speech systems.

 


There is also the question of scale and cost. Multimodal models that handle text, speech and documents require significantly more computing resources than text-only systems. This makes deployment across large government systems more complex, especially when these tools are expected to work in real time and at population scale.



Source link

India AI Impact Summit: Govt migrates BHASHINI to Indian cloud platform

India AI Impact Summit: Govt migrates BHASHINI to Indian cloud platform


India-based cloud service provider Yotta Data Services and the Digital India BHASHINI Division have moved BHASHINI’s language AI platform from a global hyperscaler to an Indian cloud setup, shifting the system to Yotta’s Government Community Cloud and Shakti Cloud. With the move, BHASHINI is now operating entirely on Indian cloud and GPU infrastructure, keeping its datasets, models and user interactions within the country’s jurisdiction.

 


The migration was showcased at a pre-summit event ahead of the India AI Impact Summit 2026 and draws on a recent deployment during the Maha Kumbh 2025, where BHASHINI’s services were used to provide translation and voice-based assistance in more than 11 Indian languages. According to the press release, the platform handled real-time requests at population scale during the event, including through a multilingual assistant built for visitors.

 


BHASHINI migration to India-based cloud: Details


According to the details shared, the migration was carried out over a two-to-three-month period and covered BHASHINI’s full AI stack, including multilingual datasets, models, APIs, containerised services, orchestration pipelines, databases and storage. The new setup runs on Yotta’s Shakti Cloud, which uses Nvidia H100 GPUs, and is built using open-source and cloud-agnostic components.


As per the release, the transition involved moving more than 200 terabytes of data and over 3.5 billion files, with no data loss reported during the process. They also said the platform has been designed as a modular and reusable framework that can be adopted across ministries, public sector units and large national programmes.

 


Amitabh Nag, CEO of the Digital India BHASHINI Division, said, “The move to Yotta’s sovereign AI cloud gives BHASHINI greater control, resilience, and scalability as it continues to serve India’s linguistic diversity. This transformation strengthens our ability to deliver inclusive, real-time multilingual services and marks a major step forward for Digital Public Infrastructure in AI. It will also serve as a blueprint for future deployments as we transition to a fully sovereign stack.”


For Yotta, the project is being presented as a proof point for running large AI workloads on Indian cloud infrastructure. Sunil Gupta, Co-founder, Managing Director and CEO, Yotta Data Services, said, “This transition highlights that hyperscale, mission-critical AI platforms can be built and operated entirely on sovereign infrastructure, without compromise. The project validates India’s ability to run advanced AI workloads on open, interoperable architectures and reflects Yotta’s capability to build and operate digital infrastructure at national scale.”


What is BHASHINI


BHASHINI is an Indian language AI platform under the Digital India programme, focused on translation, speech recognition and text-to-speech tools for Indian languages. It is designed to be used in citizen-facing services and government systems.

 


The platform supports more than 36 languages in text and over 22 languages in voice, and is already integrated into several websites and applications across government departments. Its tools are meant to help users interact with digital services through speech and regional languages, and to make forms, advisories and other public information available beyond English and a few major languages.

 


BHASHINI is also part of the broader IndiaAI Mission, which is funding domestic AI infrastructure, models and applications. The migration to Yotta’s cloud is being framed by the government as one step in building what it describes as a “sovereign” AI stack, where both the computing infrastructure and the data remain under Indian control, especially for large public platforms that operate at national scale.



Source link

What makes India AI Impact Summit different from earlier global AI meetings

What makes India AI Impact Summit different from earlier global AI meetings


The India AI Impact Summit 2026 will be held from February 16 to 20 at Bharat Mandapam, New Delhi, positioning India at the centre of the evolving global debate on artificial intelligence. Hosted by the Government of India under the IndiaAI Mission, the five-day event will bring together governments, technology companies, researchers and civil society groups to discuss how AI is being governed, built and deployed.

 


Unlike earlier global AI summits that were largely shaped by safety concerns or regulatory coordination, the India meet is being framed around ‘impact’ — that is, how AI is applied on the ground, who benefits from it, and how emerging economies fit into the global AI ecosystem.

 
 


The starting point: AI safety at Bletchley Park

 


The current series of global AI meetings began with the AI Safety Summit at Bletchley Park in the United Kingdom on November 1–2, 2023. Convened by the UK government, the summit concentrated almost entirely on the risks posed by advanced or “frontier” AI systems.

 


Its main outcome was the Bletchley Declaration, which was endorsed by 28 countries and the European Union, including the US, China, India and the UK. The declaration acknowledged both the opportunities and potential harms of AI, committing signatories to cooperate on evidence-based research into risks such as bias, misinformation and long-term safety concerns. Frontier AI companies, including OpenAI, Google DeepMind and Anthropic, also agreed to share safety testing information with governments.

 


However, participation at Bletchley Park was limited in scope. The discussions were mainly government-led, with safety as the dominant lens, and relatively little attention was paid to questions of deployment, economic impact or inclusion.

 


Broader participation in the Seoul summit

 


The AI Seoul Summit was held on May 21–22, 2024 and marked the second phase of this process. Hosted by South Korea, the summit expanded participation beyond governments to include industry, academia and civil society.

 


The meeting produced the Seoul Declaration for Safe, Innovative and Inclusive AI, which was adopted by 11 countries and the European Union. It committed participants to work towards interoperable global governance frameworks, drawing on the G7’s Hiroshima Process Code of Conduct. Sixteen major AI companies, including OpenAI, Google and Anthropic, pledged voluntary transparency around safety frameworks, risk thresholds considered “intolerable”, and mitigation measures.

 


Another group of countries, including the EU, agreed to collaborate on AI safety science, shared testing standards and risk identification, including severe harms such as misuse in chemical or biological contexts.

 


Seoul signalled a shift away from safety alone, placing greater emphasis on innovation and governance processes, while still keeping risk management at the core.

 


Paris AI summit and the turn towards action

 

The third major milestone was the AI Action Summit, held in Paris on February 10–11, 2025, and co-chaired by French President Emmanuel Macron and Prime Minister Narendra Modi. The summit drew more than 1,000 participants from over 100 countries, including international organisations, researchers, companies and civil society groups.

 


Unlike Bletchley and Seoul, the Paris summit placed “action” at the centre of its agenda, seeking concrete commitments around governance, economic impact and the societal implications of AI. Paris moved the conversation decisively towards implementation. Discussions covered public-interest AI, including digital public infrastructure and multilingual models; the future of work and skilling; trust, ethics and security; and global governance.

 


A joint declaration titled Inclusive and Sustainable Artificial Intelligence for People and the Planet was signed by 58 countries, though the US and the UK chose not to sign, citing concerns around regulation, national security and lack of clarity around AI governance.

 


The summit also saw the launch of the Current AI Initiative, backed by an initial $400 million, and the formation of a sustainability coalition focused on AI’s environmental footprint.

 


What India is trying to do differently this time

 


Against this backdrop, the India AI Impact Summit is positioned as a next step rather than a repetition. According to government briefings, the emphasis is on moving from principles and declarations to deployment and outcomes.

 


One major distinction is scale. More than 35,000 registrations from over 100 countries have been reported ahead of the summit, with heads of government, ministers and senior executives expected to attend. Global technology leaders such as Sundar Pichai, Sam Altman, Jensen Huang and Bill Gates are listed among confirmed participants.

 


Another difference is geography. This is the first summit in the series to be hosted in the Global South. Indian officials have described this as an attempt to broaden who sets the agenda on AI, particularly for developing economies that are often users rather than designers of AI systems.

 


From discussion to deployment

 


The summit’s agenda is structured around impact areas such as employment, trust, safety and sectoral applications. Sessions are planned on AI use in healthcare, education and governance, alongside discussions on labour markets and model safety.

 


Alongside policy discussions, the event will host the India AI Impact Expo, showcasing deployable AI solutions from startups, research institutions and technology firms. This focus on working systems and real-world use cases marks a departure from earlier summits that were primarily centred on regulation and safety.



Source link

Nothing Phone 4a may launch in blue, pink and yellow colours: Details

Nothing Phone 4a may launch in blue, pink and yellow colours: Details


UK-based consumer electronics brand Nothing appears to be preparing for the launch of its Phone 4a series. The company has shared a new teaser that hints that the Phone 4a series may launch in multiple colours, including black, white, blue, pink and yellow. In a post on X, Nothing wrote “Soon,” alongside a graphic made up of coloured dots forming the “(a)” logo that the company uses on its A-series products. The Nothing Phone 4a series could feature two phones: Phone 4a and Phone 4a Pro.


Nothing 4a series: What to expect

The graphics posted by Nothing show dots in black and white, along with blue, pink, and yellow colours. This could signify the colours in which Nothing may launch its Phone 4a series models. Last month, Nothing shared a video titled “Phone (4a): A New Chapter” on its YouTube channel, in which the company’s CEO, Carl Pei, said that Nothing is experimenting with more premium materials and new colour options to refresh the look and feel of the device, aligning with the new teaser.

 
 


Blue is not entirely new for the brand, as it has previously appeared in select variants of Nothing’s A-series devices. Yellow, meanwhile, was used prominently on the Nothing Ear (a) earbuds launched in 2024, making it a familiar colour. The inclusion of multiple bright colour options suggests Nothing may continue its focus on design and visual identity with the Phone 4a series.

 


So far, the company has not shared any details about specifications or a launch timeline. However, the teaser indicates that an official announcement could happen soon, with more information expected in the coming weeks. In the video shared by the company last month, Pei described the Phone 4a as “a complete evolution” over its predecessor, spanning the display, camera, and overall performance.


According to a previous report by 9To5Google, the Nothing Phone 4a series has surfaced in regulatory filings, with the Pro model appearing on the European Union’s energy labelling website. The listing revealed some early details about the device.

 

As per the report, the Nothing Phone 4a models could feature a slightly larger battery, with listings showing a rated capacity of 5,080mAh. By comparison, the Phone 3a series ships with a 5,000mAh battery but has a rated capacity of 4,920mAh. The Nothing Phone 4a series smartphones are also expected to come with improved durability, including an upgraded IP65 rating for dust and water resistance. 

 


According to a report by GSMArena, both phones in the lineup are expected to be powered by Qualcomm Snapdragon 7-series chips, although the exact processors have not yet been revealed. The Pro variant is said to support eSIM, and at least one version is also expected to feature UFS 3.1 storage.

 


In related news, Nothing has confirmed that its upcoming smartphones will cost more, citing a sharp rise in memory prices driven by global demand from artificial intelligence data centres. Recently, on X, Nothing CEO Carl Pei said the company will raise prices across its smartphone portfolio in 2026, as rising component costs make it difficult to maintain current pricing without cutting specifications.

 



Source link

YouTube
Instagram
WhatsApp