Interact with the latest state-of-the-art AI model APIs optimized on the NVIDIA accelerated computing stack—from your browser.
NVIDIA AI Foundation Models are moving to the new NVIDIA API Catalog!
Starting March 18, 2024, NVIDIA AI Foundation Models have begun migrating
to our API Catalog at build.nvidia.com and are scheduled to
complete in the next few months. During this migration, some models may be
affected and not accessible. For API users, your API keys will need to be
updated to reflect new endpoints. By signing in with your NGC account on
the API Catalog, you will also receive some free credits to make API calls.
Please update your bookmarks accordingly.
Experience AI in ActionJoin a passionate community and work with state-of-the-art models to kick-start your own development efforts.
Maxine Live Portrait is a generative model which animates a portrait photo with a driving video such that the facial expressions and movements of the person in the video are transferred to the photo.
Smaug-72B is a large language model developed by Abacus.AI by finetuning a Qwen-72B based model, ultimately LLAMA2 architecture, using the DPO-Positive (DPOP) technique.
Kosmos-2 model is a groundbreaking multimodal large language model (MLLM). Kosmos-2 is designed to ground text to the visual world, enabling it to understand and reason about visual elements in images.
Phi-2 is a 2.7 billion parameter language model developed by Microsoft Research. The phi-2 model is best suited for prompts using the QA format, the chat format, and the code format
NVIDIA cuOpt is a world-record-breaking accelerated optimization engine. cuOpt helps teams solve complex routing problems with multiple constraints and deliver new capabilities, like dynamic rerouting, job scheduling, and robotic simulations.
The SeamlessM4T V2-T2TT model is a part of the SeamlessM4T-v2 collection of models, which are designed to provide high-quality translation for various tasks, including speech and text translation.
NV-Llama2-70B-RLHF-Chat is a 70 billion parameter generative language model instruct-tuned on LLama2-70B model. It takes input with context length up to 4,096 tokens.
Llama 2 SteerLM Chat is a large language model, aligned using the SteerLM technique developed by NVIDIA. This allows you to adjust the preferred style of response to attributes (such as creativity, complexity and verbosity) at inference time.
The Yi-34B is a large language model trained from scratch by developers at 01.AI. Yi-34B has been finetuned for various chat usecases and has upto 200K context window.
Nemotron-3-8B-Chat-SteerLM is an 8 billion parameter generative language model based on the Nemotron-3-8B base model. It has been customized for user control of model outputs during inference using the SteerLM method developed by NVIDIA.
NVIDIA Retrieval QA Embedding is an embedding model that represents words, phrases, or other entities as vectors of numbers and understands the relation between words and phrases.
A genome-scale language foundation model (GenSLM) is an LLM trained on all known genomes from a virus or bacteria. It learns the evolutionary landscape of viruses like SARS-CoV-2 and can accurately and rapidly identify new variants.
Nemotron-3-8B-QA is a 8 billion parameter generative language model based on the Nemotron-3-8B base model. The model has been further fine-tuned for instruction following by NVIDIA specifically for Question Answering.
The CLIP (Contrastive Language-Image Pretraining) model combines vision and language using contrastive learning. It understands images and text together, enabling tasks like image classification and object detection.