Multimodal Conversational AI

Accelerate the full pipeline, from automatic speech recognition to natural language understanding and outputting text-to-speech.

AI-driven services in speech, vision, and language present a revolutionary path for personalized natural conversation, but they face strict accuracy and latency requirements for real-time interactivity. With NVIDIA’s conversational AI platform, developers can quickly build and deploy state-of-the-art AI services to power applications across a single unified architecture, delivering highly accurate, low-latency systems with little upfront investment.

Conversational AI Models From NGC

State-of-the-Art Models

Harness conversational AI models from NGC™ that are trained for more than 100,000 hours on NVIDIA DGX systems.

Multinodal Solutions to Build Human-Like Interactive Skills

Custom Multimodal Skills

Effortlessly fuse speech, language, and vision into a single pipeline to build human-like interactive skills.

Deploy Optimized Models in the Cloud & Data Center

Rapid Deployment

Deploy optimized models in the cloud, in the data center, and at the edge with a single command.

End-to-End Acceleration to Execute Model Inference Under the 300ms latency Bound

End-to-End Acceleration

Accelerate at pipeline scale and execute model inference in well under the 300 millisecond (ms) latency bound.

True End-to-End Acceleration

Fully Accelerated Pipeline

Full Pipeline Inference in Fractions of a Second

Execute full conversational AI pipelines consisting of automatic speech recognition (ASR) for audio transcription, natural language understanding (NLU), and text-to-speech (TTS) in well under the 300 ms latency bound for real-time interactions, freeing up room to increase pipeline complexity without sacrificing user experience. 

The NVIDIA A100 Tensor Core GPU delivered record-setting performance in the MLPerf Training v0.7 benchmark, clocking in at 6.53 hours per accelerator for BERT on WikiText and 0.83 minutes at scale.

NVIDIA Solutions For
Conversational AI Applications

Train and Deploy with Purpose-Built Systems

Train at Scale

NVIDIA DGX A100 features eight NVIDIA A100 Tensor Core GPUs—the most advanced data center accelerator ever made. Tensor Float 32 (TF32) precision delivers a 20X AI performance improvement over previous generations—without any code change—and an additional 2X performance boost by leveraging structural sparsity across common NLP models. Third-generation NVIDIA® NVLink®, second-generation NVIDIA NVSwitch, and NVIDIA Mellanox® InfiniBand enable ultra-high-bandwidth and low-latency connections between all the GPUs. This allows multiple DGX A100 systems to train massive billion-parameter models at scale to deliver state-of-the-art accuracy. And with NVIDIA NeMo, an open-source toolkit, developers can build, train, and fine-tune DGX-accelerated conversational AI models with only a few lines of code.

NVIDIA DGX A100 - Universal System for AI Infrastructure
NVIDIA DGX A100 - Universal System for AI Infrastructure

Deploy at the Edge

NVIDIA EGX Platform makes it possible to drive real-time conversational AI while avoiding networking latency by processing high-volume speech and language data at the edge. With NVIDIA TensorRT, developers can optimize models for inference and deliver conversational AI applications with low latency and high throughput. With the NVIDIA Triton Inference Server, the models can then be deployed in production. TensorRT and Triton Inference Server work with NVIDIA Jarvis, an application framework for conversational AI, for building and deploying end-to-end, GPU-accelerated multimodal pipelines on EGX. Under the hood, Jarvis applies TensorRT, configures the Triton Inference Server, and exposes services through a standard API, deploying with a single command through Helm charts on a Kubernetes cluster.

AI-Driven Multimodal Skills

Multi-Speaker Transcription

Classic speech-to-text algorithms have evolved, making it now possible to transcribe meetings, lectures, and social conversations while simultaneously identifying speakers and labeling their contributions. NVIDIA Jarvis enables the fusion of multi-sensor audio and vision data into a single stream of information used for advanced transcription components, like the visual diarization necessary to differentiate multiple voices in real time.


Virtual Assistant

Virtual assistants can engage with customers in a nearly human-like way, powering interactions in contact centers, smart speakers, and in-car intelligent assistants. AI-driven services like speech recognition, language understanding, voice synthesis, and vocoding alone cannot support such a system, as they’re missing key components such as dialogue tracking. Jarvis supplements these backbone services with easy-to-use components that can be extended for any application.

Accelerating Enterprises and Developer Libraries

  • Ecosystem Partners
  • Developer Libraries

GPU-accelerate top speech, vision, and language workflows to meet enterprise-scale requirements.

Intelligent Voice

Popular Conversational AI libraries building GPU-accelerated state-of-the-art deep learning models.

Hugging Face

Industry Use Cases

Curai’s Platform to Enhance Patient Experience

Chat-Based App Enhances Patient Experience

Using natural language processing, Cureai’s platform allows patients to share their conditions with their doctors, access their own medical record and helps providers extract data from medical conversations to better inform treatment..

Square Takes Edge Off Conversational AI with GPUs

Square Takes Edge off Conversational AI with GPUs

Learn about Square Assistant, a  conversational AI engine that empowers small businesses to communicate with their customers more efficiently.

Transforming  Financial  Services  With Conversational  AI

Transforming Financial Services with Conversational AI

Discover what the enterprise journey should look like for successful implementation and how to enable your business through ROI.

Get Started Accelerating Conversational AI Today

Train AI Models with NVIDIA NeMo Framework

Run Training on NVIDIA DGX A100 Systems

Simplify Deployment with NVIDIA Jarvis Framework

Deploy to the Edge on NVIDIA EGX Platform