Global Public Sector

Sarvam Brings AI Adapted for India to 1.4 Billion People With the NVIDIA Accelerated Computing Platform

Objective

Sarvam is building the AI infrastructure for India—multilingual, population-scale, and culturally aware by design. As the first startup selected under India's IndiaAI Mission, Sarvam set out to train, align, and serve foundation models for a nation of 1.4 billion people speaking dozens of languages and hundreds of dialects.

By deploying the NVIDIA accelerated computing platform—from NVIDIA H100 GPUs and NVIDIA Quantum InfiniBand networking to NVIDIA Nemotron™ datasets, NVIDIA NeMo™ libraries, NVIDIA NIM™ microservices, and NVIDIA Cloud Functions—Sarvam turned fragmented research infrastructure into a unified AI factory, cutting production-scale time-to-first-inference from weeks to minutes and delivering reliable, real-time, voice-based AI services to every resident in their native language.

Customer

Sarvam AI

Partner

Yotta, NVIDIA Cloud Partner

Use Case

Generative AI / LLMs

Products

Key Takeaways

Sarvam 30B and 105B open models and their multimodal variants deliver leading accuracy in Indian languages and excel in agentic tasks—including reasoning, instruction following, and tool calling.

Near-linear GPU scaling was achieved across 4,096+ NVIDIA H100 GPUs for training using NVIDIA Quantum InfiniBand and Megatron-LM 6D parallelism.

Time-to-first-inference for production scale was reduced from weeks to minutes using NVIDIA Cloud Functions.

Discover how Sarvam is pioneering a "Made in India" full-stack AI platform. From its compact, high-performing Sarvam-1 frontier model to open-weights architectures, Sarvam is engineering affordable infrastructure designed to scale across 1.4 billion people.

Challenge

Serving AI Applications at Scale

India presents AI builders with a challenge found nowhere else on Earth: more than 1.4 billion people, 22 officially recognized languages, and hundreds of dialects. Sarvam AI was founded to change that—to build production-grade AI that speaks India's languages, operates within India's borders, and earns India's trust.

But being the first to attempt something at this scale meant confronting infrastructure realities that no one had solved before. Sarvam's early GPU clusters were unreliable during month-long pretraining runs, risking data loss. Preprocessing terabytes of raw, code-mixed Indian language text—noisy Hinglish scraped from the web, low-fidelity audio from Indian phone networks—was slow; compute was often underutilized while CPU-bound data pipelines caught up. And the gap between a trained model and a real-time production service was measured not in hours, but weeks. Sarvam needed a solution that could deliver low-latency voice interactions.

Most critically, difficulty scaling multiple GPUs posed a challenge to Sarvam’s workload. Training 100B+ parameter models requires thousands of GPUs with high utilization and reliability. Generic distributed training frameworks couldn't keep up, and parameter synchronization across massive clusters became a networking challenge. What Sarvam needed wasn't just better tools—it needed an entirely different architecture.

Team Sarvam

Results

Unified AI Factory for Serving Population-Scale Workloads

Sarvam, accelerated by NVIDIA, now reaches 1.4 billion Indians. Unique Identification Authority of India, or UIDAI, uses Sarvam's AI for Aadhaar, India’s national digital identity program, to deliver near-real-time, voice-based feedback and fraud alerts to India's entire population in their native languages—a deployment that would be impossible without millisecond-level multilingual inference at scale.

Time-to-first production-ready inference has been reduced from weeks to minutes. NVIDIA Cloud Functions compressed what was once a weeks-long deployment cycle into a rapid, repeatable process—letting Sarvam iterate on production models at the speed the mission demands.

Near-linear scaling has been achieved across 4,096+ NVIDIA H100 GPUs. NVIDIA Quantum InfiniBand and Megatron-LM's 6D parallelism eliminated the networking bottlenecks that had previously capped what Sarvam could train. 100B+ parameter mixture-of-expert (MoE) models now run efficiently across thousands of GPUs on Yotta.

Voice agentic workflows are now operational. Sarvam can build end-to-end voice agentic pipelines—integrating automatic speech recognition, large language models, and text-to-speech into a single low-latency flow—enabling natural, conversational AI interactions that were previously blocked by cumulative latency.

Enterprise automation across India. For enterprise clients including Tata Capital and Infosys, Sarvam's agents automate KYC, sales, and customer support across telephony and WhatsApp in local Indian languages—expanding market reach to previously underserved populations.

The results made the case for something larger still.

“NVIDIA's full-stack accelerated computing platform—from NVIDIA Quantum InfiniBand networking and H100 GPUs to NVIDIA Nemotron, NeMo, and Dynamo—turned our research infrastructure into a unified, production-grade AI factory. We went from weeks to minutes on time-to-first production inference, scaled near-linearly across thousands of GPUs, and are now delivering real-time voice AI in Indian languages to over a billion people. That's what it takes to build AI at a national scale. ”

Dr. Pratyush Kumar
Founder, Sarvam AI

Looking Ahead

Proving That Specialized AI Can Scale—and Inspiring What Comes Next

Sarvam is now expanding its platform to serve additional government ministries and enterprise clients across India, with plans to deepen integration of NVIDIA Dynamo for inference optimization and continue scaling its AI infrastructure with NVIDIA Cloud Partners, including Yotta. The team is also extending its model lineup to cover additional Indian languages and multimodal use cases, including vision-language understanding tailored to Indian documents and regional content.

Longer term, Sarvam's infrastructure is designed not just for India—but as a blueprint. Other nations building AI capabilities face the same structural challenges: fragmented infrastructure, scarce domain-specific data, and the need to serve populations that global models were never designed to reach. Sarvam's proof that population-scale, multilingual AI is achievable with the NVIDIA accelerated computing platform, representing a new model for how nations take ownership of AI without sacrificing performance or safety.

Learn more about NVIDIA solutions for the global public sector.

Explore Solutions