Conversational AI

Accelerate the Full Pipeline, from Speech Recognition to Language Understanding and Speech Synthesis

AI-driven services in speech and language present a revolutionary path for personalized natural conversation, but they face strict accuracy and latency requirements for real-time interactivity. With NVIDIA’s conversational AI SDK, developers can quickly build and deploy state-of-the-art AI services to power applications across a single unified architecture, delivering highly accurate, low-latency systems with little upfront investment.

Conversational AI Models From NGC

World-Class Accuracy

Harness conversational AI models from NGC that are trained on various open and proprietary datasets for more than 100,000 hours on NVIDIA DGX systems.

Multinodal Solutions to Build Human-Like Interactive Skills

Fully Customizable

Customize speech and language skills at every stage of the process, from data to model to pipeline. 

Deploy Optimized Models in the Cloud & Data Center

Scalable Deployment

Scale your applications easily to handle hundreds and thousands of concurrent requests.

End-to-End Acceleration to Execute Model Inference Under the 300ms latency Bound

Real-Time Performance

Execute end-to-end model inference in under the 300 millisecond (ms) latency bound.

Introduction to Conversational AI

Download our e-book for an introduction to conversational AI, how it works, and how it’s applied in industry today.

True End-to-End Acceleration

Fully Accelerated Pipeline

Full Pipeline Inference in Fractions of a Second

Execute full conversational AI pipelines consisting of automatic speech recognition (ASR) for audio transcription, natural language understanding (NLU), and text-to-speech (TTS) in well under the 300ms latency bound for real-time interactions, freeing up room to increase pipeline complexity without sacrificing user experience.

NVIDIA Solutions For
Conversational AI Applications

Training Solutions

Easily Develop  Models with NVIDIA NeMo

Easily Develop Models with NVIDIA NeMo

Build, train, and fine-tune state-of-the-art speech and language models using an open-source framework, NVIDIA NeMo™.

Smarter Training with the NVIDIA TAO Toolkit

Smarter Training with the NVIDIA TAO Toolkit

Speed up development time by 10X using production-quality, NVIDIA-pretrained models and the NVIDIA TAO Toolkit.

NVIDIA DGX A100 for AI Infrastructure

Run Training on NVIDIA DGX A100 Systems

Accelerate time to solution by learning powerful billion-parameter language models with unmatched speed and scalability.

Deployment Solutions

NVIDIA Riva - Conversational AI Services

Simplify Deployment with NVIDIA Riva

Deploy optimized conversational AI services for maximum performance in the cloud, in the data center, and at the edge.

Enable Real-Time Conversation With NVIDIA

Deploy at the Edge with NVIDIA EGX Platform

Enable real-time conversation while avoiding networking latency by processing high-volume speech and language data at the edge.

Train and Deploy with Purpose-Built Systems

Train at Scale

NVIDIA DGX A100 features eight NVIDIA A100 Tensor Core GPUs—the most advanced data center accelerator ever made. Tensor Float 32 (TF32) precision delivers a 20X AI performance improvement over previous generations—without any code change—and an additional 2X performance boost by leveraging structural sparsity across common NLP models. The way A100 is designed allows multiple DGX A100 systems to train massive billion-parameter models at scale to deliver state-of-the-art accuracy. NVIDIA provides the NeMo and TAO toolkits for distributed training of conversational AI models on A100.

NVIDIA DGX A100 - Universal System for AI Infrastructure
NVIDIA DGX A100 - Universal System for AI Infrastructure

Deploy at the Edge

NVIDIA EGX Platform makes it possible to drive real-time conversational AI while avoiding networking latency by processing high-volume speech and language data at the edge. With NVIDIA TensorRT, developers can optimize models for inference and deliver conversational AI applications with low latency and high throughput. With the NVIDIA Triton Inference Server, the models can then be deployed in production. TensorRT and Triton Inference Server work with NVIDIA Riva, an application framework for conversational AI, for building and deploying end-to-end, GPU-accelerated pipelines on EGX. Under the hood, Riva applies TensorRT, configures the Triton Inference Server, and exposes services through a standard API, deploying with a single command through Helm charts on a Kubernetes cluster.

Conversational AI Applications

Multi-Speaker Transcription

Classic speech-to-text algorithms have evolved, making it now possible to transcribe meetings, lectures, and social conversations while simultaneously identifying speakers and labeling their contributions. NVIDIA Riva allows you to create accurate transcriptions in call centers, video conferencing meetings, and automate clinical note taking during physician-patient interactions. With Riva, you can also customize models and pipelines to meet your specific use case needs.

NVIDIA Riva Enables the Fusion of Multi-Sensor Audio and Vision Data
AI Driven Services to Engage With Customers

Virtual Assistant

Virtual assistants can engage with customers in a nearly human-like way, powering interactions in contact centers, smart speakers, and in-car intelligent assistants. AI-driven services like speech recognition, language understanding, voice synthesis, and vocoding alone cannot support such a system, as they’re missing key components such as dialogue tracking. Riva supplements these backbone services with easy-to-use components that can be extended for any application.

Accelerating Enterprises and Developer Libraries

  • Ecosystem Partners
  • Developer Libraries

GPU-accelerate top speech, vision, and language workflows to meet enterprise-scale requirements.

Intelligent Voice

Build GPU-accelerated, state-of-the-art deep learning models with popular conversational AI libraries.

Hugging Face

Industry Use Cases

Curai’s Platform to Enhance Patient Experience

Chat-Based App Enhances Patient Experience

Using natural language processing, Cureai’s platform allows patients to share their conditions with their doctors, access their own medical record and helps providers extract data from medical conversations to better inform treatment.

Square Takes Edge Off Conversational AI with GPUs

Square Takes Edge off Conversational AI with GPUs

Learn about Square Assistant, a conversational AI engine that empowers small businesses to communicate with their customers more efficiently.

Natural Language Processing for Fraud Prevention

Natural Language Processing for Fraud Prevention

It’s estimated that, by 2023, businesses will save over $200 billion with fraud prevention. Learn how NLP can detect fraud across multiple channels, and how American Express, Bank of New York Mellon, and PayPal are using it in their fraud detection strategies.

Get Started Accelerating Conversational AI Today

Train Smarter with NVIDIA TAO Toolkit

Run Training on NVIDIA DGX A100 Systems

Simplify Deployment with NVIDIA Riva

Deploy to the Edge on the NVIDIA EGX Platform

Easily Build Models with NVIDIA NeMo