NVIDIA-Certified Professional

AI Infrastructure

(NCP-AII)

About This Certification

The NCP-AI Infrastructure certification is an intermediate-level credential that validates a candidate’s ability to deploy, configure, and validate advanced NVIDIA AI infrastructure. The exam is online and proctored remotely, includes approximately 70 questions, and has a 120-minute time limit.

Please carefully review our certification FAQs and exam policies before scheduling your exam.

If you have any questions, please contact us here.

Please note: To access the exam, you’ll need to create a Certiverse account.

Certification Exam Details

Duration: 120 minutes  

Price: $400 

Certification level: Professional  

Subject: AI Infrastructure  

Number of questions: 70-75

Prerequisites: Two to three years of operational experience working in a data center with NVIDIA hardware solutions. The candidate should be able to deploy all the parts of a data center infrastructure in support of AI workloads.

Language: English 

Validity: This certification is valid for two years from issuance. Recertification may be achieved by retaking the exam.

Credentials: Upon passing the exam, participants will receive a digital badge and optional certificate indicating the certification level and topic.

Exam Preparation

Topics Covered in the Exam

Topics covered in the exam include:

  • Install and configure servers & networks
  • Physical layer management
  • Troubleshoot and optimize systems and networks

Candidate Audiences

  • Data center administrators
  • Infrastructure administrators
  • Network administrators
  • Network engineers
  • Storage administrators
  • System administrators
  • Solution architects

Recommended Training

AI Infrastructure & Operations Fundamentals

A self-paced course that covers essential components of AI infrastructure, including compute platforms, networking, and storage solutions. The course also addresses AI operations, focusing on infrastructure management and cluster orchestration.

AI Infrastructure Professional Workshop

A multi-day workshop that covers the essential aspects of AI infrastructure in modern data centers, focusing on NVIDIA's cutting-edge technologies. The course provides a deep dive into optimizing AI workloads, managing GPU resources, and leveraging NVIDIA's ecosystem to build and maintain efficient AI-driven data centers.

Exam Study Guide

Review study guide

Exam Blueprint

The table below provides an overview of the topic areas covered in the certification exam and how much of the exam is focused on that subject.

Topic Areas % of Exam Topics Covered
System and Server Bring-up 31%
  • Describe sequence of events for deployment and validation.
  • Describe network topologies for AI factories.
  • Perform initial configuration of BMC, OOB, and TPM. 
  • Perform firmware upgrades (including on HGX™) and fault detection.
  • Validate power and cooling parameters.
  • Install GPU-based servers (SMI).
  • Validate installed hardware.
  • Describe and validate cable types and transceivers.
  • Install physical GPUs.
  • Validate hardware operation for workloads.
  • Configure initial parameters for third-party storage.
Physical Layer Management 5%
  • Configure and manage a BlueField® network platform.
  • Configure MIG (AI and HPC).
Control Plane Installation and Configuration 19%
  • Install Base Command™ Manager (BCM), configure and verify HA.
  • Install OS.
  • Install Cluster (configure category, configure interfaces, install Slurm/Enroot/Pyxis).
  • Install/update/remove NVIDIA GPU and DOCA™ drivers.
  • Install the NVIDIA container toolkit.
  • Demonstrate how to use NVIDIA GPUs with Docker.
  • Install NGC™ CLI on hosts.
Cluster Test and Verification 33%
  • Perform a single-node stress test.
  • Execute HPL (High-Performance Linpack).
  • Perform single-node NCCL (including verifying NVLink™ Switch).
  • Validate cables by verifying signal quality.
  • Confirm cabling is correct.
  • Confirm FW/SW on switches.
  • Confirm FW/SW on BlueField-3.
  • Confirm FW on transceivers.
  • Run ClusterKit to perform a multifaceted node assessment.
  • Run NCCL to verify E/W fabric bandwidth.
  • Perform NCCL burn-in.
  • Perform HPL burn-in.
  • Perform NeMo™ burn-in.
  • Test storage.
Troubleshoot and Optimize 12%
  • Identify and troubleshoot hardware faults (e.g., GPU, fan, network card). 
  • Identify faulty cards, GPUs, and power supplies. 
  • Replace faulty cards, GPUs, and power supplies. 
  • Execute performance optimization for AMD and Intel servers. 
  • Optimize storage.

Contact Us

NVIDIA offers training and certification for professionals looking to enhance their skills and knowledge in the field of AI, accelerated computing, data science, advanced networking, graphics, simulation, and more.

Contact us to learn how we can help you achieve your goals.

Stay Up to Date

Get training news, announcements, and more from NVIDIA, including the latest information on new self-paced courses, instructor-led workshops, free training, discounts, and more. You can unsubscribe at any time.