NVIDIA EGX™ Platform makes it possible to drive real-time conversational AI while avoiding networking latency by processing high-volume speech and language data at the edge. With NVIDIA TensorRT™, developers can optimize models for inference and deliver conversational AI applications with low latency and high throughput. With the NVIDIA Triton™ Inference Server, the models can then be deployed in production. TensorRT and Triton Inference Server work with NVIDIA Jarvis, an application framework for conversational AI, for building and deploying end-to-end, GPU-accelerated multimodal pipelines on EGX. Under the hood, Jarvis applies TensorRT, configures the Triton Inference Server, and exposes services through a standard API, deploying with a single command through Helm charts on a Kubernetes cluster.