Accelerate Innovation in the Cloud

Diagnosing cancer. Predicting hurricanes. Automating business operations. These are some of the breakthroughs possible when you use accelerated computing to unveil the insights hiding in vast volumes of data. Amazon Web Services (AWS) and NVIDIA have collaborated for over 13 years to deliver the most powerful and advanced GPU-accelerated cloud to help customers build a more intelligent future.

Power New Capabilities With AWS and NVIDIA

Healthcare

Deliver personalized medicine and accelerate breakthroughs in biomedical research with AWS and NVIDIA solutions.

Media and Entertainment

Realize the potential of cloud computing for digital content creation. Adapt your resources as your studio’s demands grow, and access the best creative talent across the globe.

Financial Services

Boost risk management, improve data-backed decisions and security, and enhance customer experiences with generative AI, deep learning, machine learning, and natural language processing (NLP) solutions.

Digital Twins and the Metaverse

Harness the power of large-scale simulation for industrial and scientific applications.

Enterprise AI and Machine Learning

Reduce development time, lower costs, improve accuracy and performance, and have more confidence in AI outcomes with NVIDIA solutions running on AWS.

High-Performance Computing

Learn how AWS and NVIDIA high-performance computing (HPC) solutions are optimized to work together, cost-effectively solving the world’s most complex problems.

Explore Customer Stories

Perplexity: Serving 800+ Million User Queries per Month With AI

Perplexity built pplx-api using NVIDIA A100 Tensor Core GPUs on AWS and NVIDIA TensorRT™-LLM, achieving up to 3.1x lower latency and 4.3x lower first-token latency compared to other platforms.​ The startup slashed inference costs by 4x—saving $600,000 annually—while scaling to hundreds of GPUs, with NVIDIA H100 GPUs delivering 50% lower latency and 200% higher throughput than A100s.​

Noetik: Powering Precision Cancer Therapies With Machine Learning

Noetik, a member of the NVIDIA Inception Program, uses NVIDIA Hopper™ Tensor Core GPUs on AWS SageMaker HyperPod to train multimodal foundation models for precision cancer immunotherapy. This enables processing of 1 petabyte of human tumor data—profiling over 200 million cells—to accelerate therapeutic discovery and unlock treatments tailored to individual patients.

Fireworks.ai: Generative AI Inference for Developers

Fireworks.ai built a lightning-fast, cost-optimized generative AI inference solution using Amazon EC2 P5 instances powered by NVIDIA H100 Tensor Core GPUs. The platform delivers 4x higher throughput per instance than open-source solutions, cuts latency by up to 50%, and reduces overall costs by 4x for some customers. Developers can run, fine-tune, and customize foundation models including Llama 2, Stable Diffusion XL, and StarCoder while meeting HIPAA and SOC2 Type II compliance standards.

A-Alpha Bio: AI-Accelerated Drug Discovery

A-Alpha Bio accelerated drug discovery by deploying NVIDIA BioNeMo™ on AWS, achieving 12x faster inference and processing 108 million protein-binding predictions—10x more than initially planned.​ Using Amazon EC2 P5 instances powered by NVIDIA H100 Tensor Core GPUs, the biotech startup reduced experimental cycles by 1–2 iterations, cutting costs while discovering superior monoclonal antibody candidates for therapeutics.​

Synthesia: AI-Enhanced Video Production

Synthesia transformed AI video production by deploying NVIDIA GPU-powered Amazon EC2 instances, achieving a 30x improvement in ML model training throughput.​ Using Amazon EC2 P5 instances with NVIDIA H100 Tensor Core GPUs and P4 instances with NVIDIA A100 GPUs, the AI startup reduced training time for voice models from days to hours while supporting 456% user growth.​

Innophore: Advancing Speed, Accuracy, and Scale in Drug Discovery

Innophore accelerates drug discovery using NVIDIA BioNeMo to analyze protein structures with its Catalophore technology.​ The platform completed mapping of the entire human organism’s protein structures in two weeks—a task that previously took over a year.​ This improves accuracy in predicting off-target drug effects by 30% within top-ranked hits.

NVIDIA Accelerated Infrastructure—From Cloud to Edge—on AWS

Amazon Elastic Cloud Compute (EC2)

Access a broad range of NVIDIA GPU-accelerated instances on Amazon EC2 on demand to meet the diverse computational requirements of AI, machine learning, data analytics, graphics, cloud gaming, virtual desktops, and HPC applications. Starting from single-GPU instances to thousands of GPUs in EC2 UltraClusters, AWS customers can provision the right-sized GPU to accelerate time to solution and reduce total costs of running their cloud workloads.

Amazon EC2 G5 With NVIDIA A10G

Featuring NVIDIA A10G Tensor Core GPUs and support for NVIDIA RTX™ technology, EC2 G5 instances are ideal for graphics-intensive applications like video editing, rendering, 3D visualization, and photorealistic simulations. Additionally, they can be used to accelerate AI inference and single-GPU AI training workloads.

Amazon EC2 G5g With NVIDIA T4G

Featuring NVIDIA T4G Tensor Core GPUs and AWS Graviton2 processors, EC2 G5g instances are best suited for cloud game development and Android-in-the-cloud gaming services. They can also be used for cost-effective AI inference using Arm®-enabled software from the NVIDIA NGC™ catalog.

Amazon EC2 P4d With NVIDIA A100 40GB

Featuring eight NVIDIA A100 40GB Tensor Core GPUs, EC2 P4d instances deliver the highest performance for AI and HPC. For multi-node AI training and distributed HPC workloads, you can scale from few to thousands of NVIDIA A100 GPUs in EC2 UltraClusters.

Amazon EC2 P5 With NVIDIA H100 80GB:

Tensor Core GPUs deliver the highest performance in Amazon EC2 for deep learning and HPC applications. They help you accelerate your time to solution by up to 6X compared to previous-generation GPU-based EC2 instances and reduce the cost to train machine learning models by up to 40 percent.

AWS Hybrid Cloud and Edge Solutions

Leverage the power of NVIDIA-accelerated computing across a broad range of AWS hybrid cloud and edge solutions to meet the low-latency, real-time requirements of workloads like AI, machine learning, gaming, content creation, and augmented reality (AR) and virtual reality (VR) streaming. NVIDIA’s performance-optimized and cloud-native software stack ensures that you get the best performance for your applications, wherever they need to run—cloud to edge.

AWS Panorama

AWS Panorama is a collection of machine learning devices and an SDK that brings computer vision to on-premises internet protocol (IP) cameras. AWS Panorama edge devices are built on NVIDIA Jetson™ system on modules (SOMs) and use the NVIDIA JetPack™ SDK to accelerate AI at the edge for industrial inspection, traffic monitoring, and supply chain management use cases.

AWS Outposts

With NVIDIA T4 Tensor Core GPUs in AWS Outposts, you can meet security and latency requirements in a wide variety of AI and graphics applications in on-premises data centers. Combined with access to GPU-optimized software from NGC, you can derive insights from vast amounts of data orders-of-magnitude faster than CPUs alone.

AWS Wavelength

AWS Wavelength brings the AWS cloud to the edge of the 5G mobile network to develop and deploy ultra-low-latency  applications. AWS Wavelength zones offer access to NVIDIA GPU-accelerated instances to speed up applications such as game streaming, AR/VR, and AI inference at the edge.

AWS IoT Greengrass

AWS IoT Greengrass extends AWS services to edge devices, such as NVIDIA Jetson platforms, to develop AI models and deploy them at the edge to act locally on generated data. Combined with the NVIDIA DeepStream SDK, you can build and deploy high-throughput, low-latency vision AI applications at the edge.

Simplify Development and Maximize Performance With NVIDIA-Optimized Software

NVIDIA-Optimized Software on AWS

Access the computational power of NVIDIA GPU-accelerated instances on AWS to develop and deploy your applications at scale with fewer compute resources, accelerating time to solution and reducing TCO. To maximize performance and developer productivity, NVIDIA offers a wide range of GPU-optimized software for a broad range of workloads, including data science, data analytics, AI and machine learning training, AI and machine learning inference, HPC, and graphics.

NVIDIA NGC

NVIDIA NGC is the portal of enterprise services, software, management tools, and support for end-to-end AI and digital twin workflows. The NGC software catalog provides a range of resources that meet the needs of data scientists, developers, and researchers with varying levels of expertise, including containers, pretrained models, domain-specific SDKs, use case-based collections, and Helm charts for the fastest AI implementations. To take AI workloads to production with NGC software, you can access enterprise-grade support, training, and services with NVIDIA AI Enterprise.

NVIDIA AI Enterprise on AWS

NVIDIA AI Enterprise is a secure, end-to-end, cloud-native suite of AI software. It accelerates data science pipelines and streamlines the development, deployment, and management of predictive AI models to automate essential processes and deliver rapid insights from data. NVIDIA AI Enterprise includes an extensive library of full-stack software, including NVIDIA AI workflows, frameworks, pretrained models, and infrastructure optimization. Global enterprise support and regular security reviews ensure business continuity and that AI projects stay on track.

NVIDIA RTX Virtual Workstation

The NVIDIA RTX Virtual Workstation (RTX vWS) for GPU-accelerated graphics helps creative and technical professionals maximize their productivity from anywhere by providing  access to the most demanding professional design and engineering applications from the cloud.  Amazon EC2 G5 (NVIDIA A10G) and G4dn (NVIDIA T4) instances, combined with the RTX vWS Amazon Machine Image (AMI), enables the industry’s most advanced 3D graphics platform, including the latest real-time ray tracing with RTX technology in virtual machines.

NVIDIA-Accelerated AWS Services

NVIDIA and AWS collaborate closely on integrations to bring the power of NVIDIA-accelerated computing to a broad range of AWS services. Whether you provision and manage the NVIDIA GPU-accelerated instances on AWS yourself or leverage them in managed services like Amazon SageMaker or Amazon Elastic Kubernetes Service (EKS), you have the flexibility to choose the optimal level of abstraction you need.

Amazon EMR

Leverage the NVIDIA RAPIDS™ Accelerator for Apache Spark within Amazon EMR to accelerate Apache Spark 3.x data science pipelines without any code changes on NVIDIA GPU-accelerated AWS instances. This integration enables data scientists run their extract, transform, and load (ETL), data processing, and machine learning pipelines at massive scale and lower cloud costs by getting more done in less time and with fewer cloud-based instances.

Amazon SageMaker

NVIDIA AI software and GPU-accelerated instances can accelerate each step of AI and machine learning workflows within Amazon Sagemaker, including data preparation, model training, and inference serving. To deploy AI models into production faster and lower inference costs, Amazon SageMaker has integrated NVIDIA Triton™ Inference Server, enabling features like multi-framework support, dynamic batching, and concurrent model execution that maximize performance on both CPU and GPU instances on AWS.

Amazon Titan

A team of experienced scientists and developers at AWS creating Amazon Titan foundation models for Amazon Bedrock, a generative AI service, uses NVIDIA NeMo™, an end-to-end, cloud-native framework for building, customizing, and deploying generative AI models anywhere. 

And the Elastic Fabric Adapter (EFA) from AWS provides customers with an UltraCluster Networking infrastructure that can directly connect more than 10,000 GPUs and bypass the operating system and CPU using NVIDIA GPUDirect®.

Developer Resources and Quick-Start Guides

MONAI Label Workshops

Learn how you can make use of MONAI—an open-source AI framework for healthcare—in your work. Join us to get a hands-on experience.

BioNeMo Now on AWS

Researchers and developers at leading pharmaceutical and techbio companies can now easily deploy NVIDIA Clara™ software and services, including NVIDIA BioNeMo™, for accelerated healthcare through AWS.

Accelerate Your Startup

Explore the program that provides cutting-edge startups around the world with critical access to go-to-market support, technical expertise, training, and funding opportunities.

AI Capabilities Using TensorRT-LLM

Previously, creating detailed product listings required significant time and effort for sellers, but this simplified process gives them more time to focus on other tasks. The NVIDIA TensorRT-LLM software is available today on GitHub and can be accessed through NVIDIA AI Enterprise, which offers enterprise-grade security, support, and reliability for production AI.

NVIDIA CloudXR

NVIDIA CloudXR™ is NVIDIA’s extended reality (XR) streaming technology, built on RTX and RTX Virtual Workstation software. By using CloudXR alongside Amazon NICE DCV streaming protocols, you can use on-demand compute resources for all aspects of your immersive application development.

NVIDIA Triton Inference Server in Amazon SageMaker

This blog provides an overview of NVIDIA Triton Inference Server and SageMaker, shows the benefits of using Triton Inference Server containers, and showcases how easy it is to deploy your own machine learning models. To work from a sample notebook that supports this blog post, download it here.

NVIDIA Riva at Scale With Amazon EKS

This step-by-step guide show you how to deploy and scale NVIDIA Riva speech skills on Amazon EKS with Traefik-based load balancing.

Amazon Music Uses SageMaker With NVIDIA to Optimize Machine Learning Training and Inference

Take a look inside the journey Amazon Music took to optimize performance and cost using SageMaker, NVIDIA Triton Inference Server, and NVIDIA TensorRT. We show how the seemingly simple, yet intricate, search bar works, ensuring a seamless Amazon Music experience with little-to-zero typo delays and relevant real-time search results.

NVIDIA Clara Parabricks on AWS

Amazon.com, one of the most visited ecommerce websites in the world, uses an AI model that automatically corrects misspelled words in search queries to let customers more effortlessly shop. Amazon measures the success of their accelerated search results based on latency—how fast typos are corrected—and the number of successful sessions.

Access the Power of AWS and NVIDIA

Amazon EC2 P5 Instances

NVIDIA AI Enterprise

NVIDIA RTX Virtual Workstations