Generative AI Framwork


Build, customize, and deploy generative AI models.

What is the NVIDIA NeMo Framework?

NVIDIA NeMo™ framework, part of the NVIDIA AI platform, is an end-to-end, cloud-native enterprise framework to build, customize, and deploy generative AI models with billions of parameters.

The NeMo framework provides an accelerated workflow for training with 3D parallelism techniques. It offers a choice of several customization techniques and is optimized for at-scale inference of large-scale models for language and image applications, with multi-GPU and multi-node configurations.NeMo makes generative AI model development easy, cost-effective, and fast for enterprises.

How the NeMo Framework builds, trains, and deploys large language models.

Build Foundation Models for Different Modalities

The NeMo framework supports the development of text-to-text, text-to-image, and image-to-image foundation models.


  • BERT
  • GPT-3
  • T5
  • T5-MoE
  • Inform


  •  Stable Diffusion v1.5
  •  VisionTransformers (ViT) 
  •  CLIP
  •  Instruct-Pix2Pix
  •  Imagen

Build Trustworthy, Safe, and Secure LLM Applications

Programmable Guardrails for LLM-Based Applications

NeMo Guardrails is a toolkit for easily developing trustworthy, safe and secure LLM conversational systems. It natively supports LangChain, adding a layer of safety, security, and topical guardrails to LLM-based conversational applications.

Explore the Benefits

Fastest Training on GPUs

Use state-of-the-art training techniques to maximize throughput and minimize training time for foundation models with billions or trillions of parameters.

Easy to Use

A cloud-native framework with all dependencies pre-packaged and installed with validated receipts for training language and image generative AI models to convergence and deploy for inference.

Fully Flexible

An open-source approach offering full flexibility across the pipeline—from data processing, to training, to inference of generative AI models.

Run in the Cloud and On Premises

Train and deploy foundation models of any size on any GPU infrastructure. Supported on all NVIDIA DGX™ systems, NVIDIA DGX Cloud™, Microsoft Azure, Oracle Cloud Infrastructure, and Amazon Web Services.


Offers tools to customize foundation models for enterprise hyper-personalization.

Enterprise Support

Battle hardened, tested, and verified containers built for enterprises.

Try the NeMo Framework Through a Free, Hands-On Labs on NVIDIA LaunchPad

Key Features to Develop Large Language Models

State-of-the-Art Training Techniques

The NeMo framework delivers high levels of training efficiency, making training of large-scale foundation models possible, using 3D parallelism techniques such as:

  • Tensor parallelism to scale models within nodes
  • Data and pipeline parallelism to scale data and models across thousands of GPUs 
  • Sequence parallelism to distribute activation memory across tensor parallel devices

In addition, selective activation recomputing optimizes recomputation and memory usage across tensor parallel devices during backpropagation.

Customization Tools

The NeMo framework makes enterprise AI practical by offering tools to:

  • Define focus and guardrails: Define guardrails and the operating domain for hyper-personalized enterprise models to prevent LLMs from veering off into unwanted domains or saying inappropriate things, through fine-tuning, prompt learning, and adapter techniques
  • Include domain-specific knowledge: Encode and embed your AI with your enterprise’s real-time information to provide the latest responses, using NVIDIA Inform.
  • Include functional skills: Add specialized skills to solve customer and business problems. Get better responses by providing context for hyper-personalized use cases using prompt learning techniques
  • Continuously improve the model: Reinforcement learning with human feedback (RLHF) techniques allow your enterprise model to get smarter over time, aligned with human intentions.

Optimized AI Inference With NVIDIA Triton

Deploy generative AI models for inference using NVIDIA Triton Inference Server™. With powerful optimizations from FasterTransformer, you can achieve state-of-the-art accuracy, latency, and throughput inference performance on single-GPU, multi-GPU, and multi-node configurations.

Data Processing at Scale

Bring your own dataset and tokenize data to a digestible format. NeMo includes comprehensive preprocessing capabilities for data filtration, deduplication, blending, and formatting on language datasets, on Piles and multilingual C4 (mC4). These help researchers and engineers save months of development and compute time, letting them focus on building applications.

Easy-to-Use Recipes and Tools for Generative AI

The NeMo framework makes generative AI possible from day one with prepackaged scripts, reference examples, and documentation across the entire pipeline.

Building foundation models is also made easy through an auto-configurator tool, which automatically searches for the best hyperparameter configurations to optimize training and inference for any given multi-GPU configuration, training, or deployment constraints.

Easily Customize and Use Generative AI models using NVIDIA NeMo Language and Image Services

NeMo Language Service

Cloud service for enterprise hyper-personalization and at-scale deployment of intelligent large language models.

Picasso Service

An accelerated cloud service for enterprises that uses custom generative AI models for creating high-res, photorealistic images, videos, and 3D content.

Customers Accelerating Generative AI and LLM Applications with NVIDIA NeMo Framework

Accelerate industry applications With LLMs.

AI Sweden accelerated LLM industry applications by making the power of a 100 billion parameter model for regional languages easily accessible to the Nordic ecosystem. AI Sweden is digitizing Sweden’s historical records and building language models from this unstructured data that can be commercialized in enterprise applications.

Image Courtesy of Korea Telecom

Creating New Customer Experiences with LLMs

South Korea’s leading mobile operator builds billion-parameter large language models trained with the NVIDIA DGX SuperPOD platform and NeMo framework to power smart speakers, and customer call centers

Discover more resources

Deploying a 1.3B GPT-3 Model With NVIDIA NeMo Framework

Learn how to download, optimize, and deploy a 1.3 billion parameter GPT-3 model with the NeMo framework and NVIDIA’s generative AI framework.

Efficient At-Scale Training and Deployment of LLMs With NeMo Framework

Learn how to preprocess data in a multi-node environment, automatically select the best hyperparameters to minimize training time for multiple GPT-3 and T5 configurations, train the model at scale, and deploy the model in a multi-node production setting with an easy-to-use set of scripts.

Free Hands-On Lab on NVIDIA LaunchPad

Bootstrap your enterprise's LLM journey using pretuned hyperparameter configurations for GPT-3 models. Learn how to train a large-scale NLP model with NeMo framework.

Get Started Now with NVIDIA NeMo Framework