Live Captions and Transcription Services for Microsoft Teams


Microsoft Teams enables highly accurate live meeting captioning and transcription services in 28 languages.


Microsoft Teams


Microsoft Azure

Use Case

Real-time multi-language meeting captioning and transcription


Microsoft Azure Cognitive Services, NVIDIA GPUs on Azure, NVIDIA Triton Inference Server

Live Captioning and Transcription in Microsoft Teams

Microsoft Teams is a collaboration app with nearly 250 million active monthly users. To better accommodate non-native speakers, and meeting attendees who are deaf or hard of hearing, Microsoft relies on AI-generated live captions and real-time transcription.

NVIDIA Solutions

For optimal live captioning and transcription in multiple languages, the Microsoft Teams app uses Microsoft Azure Cognitive Services and NVIDIA Triton™ Inference Server. This enables them to leverage advanced language models that recognize jargon, names, and other meeting context, to deliver highly accurate, personalized speech-to-text results—in real time—with very low latency.

Microsoft Teams Results

Using Triton Inference Server in Azure Cognitive Services seamlessly enables live transcription and captions with state of the art speech models in 28 languages. Triton Inference Server delivers low latency, real-time inference of the speech recognition models and ensures that models use GPUs to their full potential. This reduces the cost to customers by delivering higher throughput using fewer computational resources.


  • Cost efficient, accurate real-time captioning and transcription across 28 languages.

About Microsoft Teams

Microsoft Teams is a collaboration app built for hybrid work that enables teams to stay informed, organized, and connected—all in one place. Customers use teams to communicate, collaborate, and co-author content across work, life, and learning—every day.

“AI models like these are incredibly complex, requiring tens of millions of neural network parameters to deliver accurate results across dozens of different languages. But the bigger a model is, the harder it is to run cost-effectively in real time.”

Principal PM Manager for Teams Calling and Meetings
and Devices