This site requires Javascript in order to view all its content. Please enable Javascript in order to access all the functionality of this web site. Here are the instructions how to enable JavaScript in your web browser.

Instructor-Led Workshop
Accelerating CUDA C++ Applications with Multiple GPUs

Request a Workshop

Ver workshops públicos

Computationally intensive CUDA^® C++ applications in high-performance computing, data science, bioinformatics, and deep learning can be accelerated by using multiple GPUs, which can increase throughput and/or decrease your total runtime. When combined with the concurrent overlap of computation and memory transfers, computation can be scaled across multiple GPUs without increasing the cost of memory transfers. For organizations with multi-GPU servers, whether in the cloud or on NVIDIA DGX^™ systems, these techniques enable you to achieve peak performance from GPU-accelerated applications. And it’s important to implement these single-node, multi-GPU techniques before scaling your applications across multiple nodes.

This workshop covers how to write CUDA C++ applications that efficiently and correctly utilize all available GPUs in a single node, dramatically improving the performance of your applications and making the most cost-effective use of systems with multiple GPUs.

Learning Objectives

By participating in this workshop, you’ll:

Use concurrent CUDA streams to overlap memory transfers with GPU computation
Utilize all available GPUs on a single node to scale workloads across all available GPUs
Combine the use of copy/compute overlap with multiple GPUs
Rely on the NVIDIA Nsight^™ Systems Visual Profiler timeline to observe improvement opportunities and the impact of the techniques covered in the workshop

Download workshop datasheet (PDF 243 KB)

Workshop Outline

Introduction (15 mins)	Meet the instructor. Create an account at courses.nvidia.com/join
Using JupyterLab (15 mins)	Get familiar with your GPU-accelerated interactive JupyterLab environment.
Application Overview (15 mins)	Orient yourself with a single GPU CUDA C++ application that will be the starting point for the course. Observe the current performance of the single GPU CUDA C++ application using Nsight Systems.
Introduction to CUDA Streams (90 mins)	Learn the rules that govern concurrent CUDA stream behavior. Use multiple CUDA streams to perform concurrent host-to-device and device-to-host memory transfers. Utilize multiple CUDA streams for launching GPU kernels. Observe multiple streams in the Nsight Systems Visual Profiler timeline view.
Break (60 mins)
Copy/Compute Overlap with CUDA Streams (90 mins)	Learn the key concepts for effectively performing copy/compute overlap. Explore robust indexing strategies for the flexible use of copy/compute overlap in applications. Refactor the single-GPU CUDA C++ application to perform copy/compute overlap. See copy/compute overlap in the Nsight Systems visual profiler timeline.
Multiple GPUs with CUDA C++ (60 mins)	Learn the key concepts for effectively using multiple GPUs on a single node with CUDA C++. Explore robust indexing strategies for the flexible use of multiple GPUs in applications. Refactor the single-GPU CUDA C++ application to utilize multiple GPUs. See multiple-GPU utilization in the Nsight Systems Visual Profiler timeline.
Break (15 mins)
Copy/Compute Overlap with Multiple GPUs (60 mins)	Learn the key concepts for effectively performing copy/compute overlap on multiple GPUs. Explore robust indexing strategies for the flexible use of copy/compute overlap on multiple GPUs. Refactor the single-GPU CUDA C++ application to perform copy/compute overlap on multiple GPUs. Observe performance benefits for copy/compute overlap on multiple GPUs. See copy/compute overlap on multiple GPUs in the Nsight Systems visual profiler timeline.
Course Assessment (30 mins)
Final Review (30 mins)	Review key learnings. Learn to build your own training environment from the DLI base environment container. Complete the workshop survey.

Workshop Details

Duration: 8 hours

Price: Contact us for pricing.

Prerequisites:

Professional experience programming CUDA C/C++ applications, including the use of the nvcc compiler, kernel launches, grid-stride loops, host-to-device and device-to-host memory transfers, and CUDA error handling
Familiarity with the Linux command line
Experience using makefiles to compile C/C++ code

Suggested resources to satisfy prerequisites: Fundamentals of Accelerated Computing with CUDA C/C++, Ubuntu Command Line for Beginners (sections 1 through 5), Makefile Tutorial (through the Simple Examples section)

Technologies: CUDA C++, Nsight Systems

Certificate: Upon successful completion of the assessment, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth.

Hardware Requirements: Desktop or laptop computer capable of running the latest version of Chrome or Firefox. Each participant will be provided with dedicated access to a fully configured, GPU-accelerated server in the cloud.

Languages: English, Simplified Chinese

PREGUNTAS?

Contact Us for Questions on
Deep Learning Training

Section

Section

Nombre

Apellido

Correo Corporativo

Organización / Nombre de la Universidad

Industria

Título Profesional

Ubicación

Ciudad

State

Reason for requesting contact

nvid hidden field

ncid hidden field

Política de Privacidad de NVIDIA

Acepto la recopilación y el procesamiento de la información anterior por parte de NVIDIA <span class="corporation-txt hidden">Corporation </span>para fines de investigación y organización de eventos , y he leído y acepto <a href="https://www.nvidia.com/es-la/about-nvidia/privacy-policy/?deeplink=visiting-our-website" target="_blank">Política de privacidad</a>.

Acepto que la información anterior se transfiere a NVIDIA Corporation en los Estados Unidos y se almacena de manera consistente con <a href="https://www.nvidia.com/es-la/about-nvidia/privacy-policy/?deeplink=visiting-our-website" target="_blank">Política de privacidad</a> debido a necesidades de investigación , organización de eventos y la correspondiente gestión interna de NVIDIA y necesidad de operación del sistema . Puede ponerse en contacto con nosotros enviando un correo electrónico a <a href="mailto:privacy@nvidia.com">privacy@nvidia.com</a> para resolver problemas relacionados.

Instructor-Led Workshop
Accelerating CUDA C++ Applications with Multiple GPUs

Instructor-Led Workshop
Accelerating CUDA C++ Applications with Multiple GPUs

Learning Objectives

Workshop Outline

Workshop Details

PREGUNTAS?

Contact Us for Questions on
Deep Learning Training

Lea nuestras preguntas frecuentes.

Infórmese sobre los servicios del Deep Learning Institute de NVIDIA.

Si tiene preguntas técnicas, consulte los Foros de Desarrolladores de NVIDIA.

Instructor-Led Workshop Accelerating CUDA C++ Applications with Multiple GPUs

Instructor-Led Workshop Accelerating CUDA C++ Applications with Multiple GPUs

Learning Objectives

Workshop Outline

Workshop Details

PREGUNTAS?

Contact Us for Questions on Deep Learning Training

Lea nuestras preguntas frecuentes.

Infórmese sobre los servicios del Deep Learning Institute de NVIDIA.

Si tiene preguntas técnicas, consulte los Foros de Desarrolladores de NVIDIA.

Instructor-Led Workshop
Accelerating CUDA C++ Applications with Multiple GPUs

Instructor-Led Workshop
Accelerating CUDA C++ Applications with Multiple GPUs

Contact Us for Questions on
Deep Learning Training