AI-driven services in speech, vision, and language present a revolutionary path for personalized natural conversation, but they face strict accuracy and latency requirements for real-time interactivity. With NVIDIA’s conversational AI SDK, developers can quickly build and deploy state-of-the-art multimodal AI services to power applications across a single unified architecture, delivering highly accurate, low-latency systems with little upfront investment.
Harness conversational AI models from NGC™ that are trained on various open and proprietary datasets for more than 100,000 hours on NVIDIA DGX™ systems.
Customize speech, language, and vision skills on your domain using TAO Toolkit.
Deploy optimized models in the cloud, in the data center, and at the edge with a single command.
Accelerate at pipeline scale and execute model inference in well under the 300 millisecond (ms) latency bound.
Execute full conversational AI pipelines consisting of automatic speech recognition (ASR) for audio transcription, natural language understanding (NLU), and text-to-speech (TTS) in well under the 300 ms latency bound for real-time interactions, freeing up room to increase pipeline complexity without sacrificing user experience.
The NVIDIA A100 Tensor Core GPU delivered record-setting performance in the MLPerf Training v0.7 benchmark, clocking in at 6.53 hours per accelerator for BERT on WikiText and 0.83 minutes at scale.
Speed up development time by 10X using production-quality NVIDIA pretrained models and TAO Toolkit.
Accelerate time to solution by learning powerful billion-parameter language models with unmatched speed and scalability.
Deploy optimized conversational AI services for maximum performance in the cloud, in the data center, and at the edge.
Enable real-time conversation while avoiding networking latency by processing high-volume speech and language data at the edge.
NVIDIA DGX™ A100 features eight NVIDIA A100 Tensor Core GPUs—the most advanced data center accelerator ever made. Tensor Float 32 (TF32) precision delivers a 20X AI performance improvement over previous generations—without any code change—and an additional 2X performance boost by leveraging structural sparsity across common NLP models. Third-generation NVIDIA® NVLink®, second-generation NVIDIA NVSwitch™, and NVIDIA Mellanox® InfiniBand enable ultra-high-bandwidth and low-latency connections between all the GPUs. This allows multiple DGX A100 systems to train massive billion-parameter models at scale to deliver state-of-the-art accuracy. And with NVIDIA NeMo™, an open-source toolkit, developers can build, train, and fine-tune DGX-accelerated conversational AI models with only a few lines of code.
NVIDIA EGX™ Platform makes it possible to drive real-time conversational AI while avoiding networking latency by processing high-volume speech and language data at the edge. With NVIDIA TensorRT™, developers can optimize models for inference and deliver conversational AI applications with low latency and high throughput. With the NVIDIA Triton™ Inference Server, the models can then be deployed in production. TensorRT and Triton Inference Server work with NVIDIA Riva, an application framework for conversational AI, for building and deploying end-to-end, GPU-accelerated multimodal pipelines on EGX. Under the hood, Riva applies TensorRT, configures the Triton Inference Server, and exposes services through a standard API, deploying with a single command through Helm charts on a Kubernetes cluster.
Classic speech-to-text algorithms have evolved, making it now possible to transcribe meetings, lectures, and social conversations while simultaneously identifying speakers and labeling their contributions. NVIDIA Riva enables the fusion of multi-sensor audio and vision data into a single stream of information used for advanced transcription components, like the visual diarization necessary to differentiate multiple voices in real time.
Virtual assistants can engage with customers in a nearly human-like way, powering interactions in contact centers, smart speakers, and in-car intelligent assistants. AI-driven services like speech recognition, language understanding, voice synthesis, and vocoding alone cannot support such a system, as they’re missing key components such as dialogue tracking. Riva supplements these backbone services with easy-to-use components that can be extended for any application.
GPU-accelerate top speech, vision, and language workflows to meet enterprise-scale requirements.
Build GPU-accelerated, state-of-the-art deep learning models with popular conversational AI libraries.
Using natural language processing, Cureai’s platform allows patients to share their conditions with their doctors, access their own medical record and helps providers extract data from medical conversations to better inform treatment..
Learn about Square Assistant, a conversational AI engine that empowers small businesses to communicate with their customers more efficiently.
Discover what the enterprise journey should look like for successful implementation and how to enable your business through ROI.
Stay tuned for data science news and content, delivered straight to your inbox.
Send me the latest enterprise news, announcements, and more from NVIDIA. I can unsubscribe at any time.