This site requires Javascript in order to view all its content. Please enable Javascript in order to access all the functionality of this web site. Here are the instructions how to enable JavaScript in your web browser.

Instructor-Led Workshop
Accelerating Data Engineering Pipelines

Request a workshop for your organization

Notify me when public workshops are available

Data engineering is the foundation of data science and lays the groundwork for analysis and modeling. In order for organizations to extract knowledge and insights from structured and unstructured data, fast access to accurate and complete datasets is critical. Working with massive amounts of data from disparate sources requires complex infrastructure and expertise. Minor inefficiencies can result in major costs, both in terms of time and money, when scaled across millions to trillions of data points.

In this workshop, we’ll explore how GPUs can improve data pipelines and how using advanced data engineering tools and techniques can result in significant performance acceleration. Faster pipelines produce fresher dashboards and machine learning (ML) models, so users can have the most current information at their fingertips.

Learning Objectives

By participating in this workshop, you’ll learn:

How data moves within a computer. How touild the right balance between CPU, DRAM, Disk Memory, and GPUs.
How different file formats can be read and manipulated by hardware.
How to scale an ETL pipeline with multiple GPUs using NVTabular.
How to build an interactive Plotly dashboard where users can filter on millions of data points in less than a second.

Download workshop datasheet (PDF 318 KB)

Workshop Outline

Introduction (15 mins)	Meet the instructor. Create an account at courses.nvidia.com/join
Data on the Hardware Level (60 mins)	Explore the strengths and weaknesses of different hardware approaches to data and the frameworks that support them: Pandas CuDF Dask
Break (15 mins)
ETL with NVTabular (120 mins)	Learn how to scale an ETL pipeline from 1 GPU to many with NVTabular through the perspective of a big data recommender system. Transform raw json into analysis-ready parquet files Learn how to quickly add features to a dataset, such as Categorify and Lambda operators
Break (60 mins)
Data Visualization (120 mins)	Step into the shoes of a meteorologist and learn how to plot precipitation data on a map. Learn how to use descriptive statistics and plots like histograms in order to assess data quality Learn effective memory usage, so users can quickly filter data through a graphical interface
Final Project: Data Detective (60 mins)	Users are complaining that the dashboard is too slow. Apply the techniques learned in class to find and eliminate efficiencies in the backend code
Final Review (15 mins)	Review key learnings and answer questions. Complete the assessment and earn your certificate. Complete the workshop survey. Learn how to set up your own AI application development environment.

Workshop Details

Duration: 8 hours

Price: $500 for public workshops, contact us for enterprise workshops.

Prerequisites:

Intermediate knowledge of Python (list comprehension, objects)
Familiarity with pandas a plus
Introductory statistics (mean, median, mode)

Technologies: pandas, cuDF, Dask, NVTabular, Plotly

Assessment Type: Skills-based coding assessments evaluate your ability to efficiently filter through millions of data points in the context of an interactive dashboard.

Certificate: Upon successful completion of the assessment, you’ll receive an NVIDIA DLI certificate to recognize your subject matter competency and support your professional career growth.

Hardware Requirements: You’ll need a desktop or laptop computer capable of running the latest version of Chrome or Firefox. You’ll be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.

Languages: English

Upcoming Workshops

Central European Time

GTC

Mon, Apr 12, 2021

9:00 a.m.–5:00 p.m.

Register now

Pacific Time

GTC

Tue, Apr 13, 2021

9:00 a.m.–5:00 p.m.

Register now

Pacific Time

Tue, Jun 22, 2021

9:00 a.m.–5:00 p.m.

Register now

Pacific Time

Tue, Jul 27, 2021

9:00 a.m.–5:00 p.m.

Register now

Upcoming Public Workshops

Europe / Middle East / Africa

Tuesday, July 6, 2021
9:00 a.m.–5:00 p.m. CEST

North America / Latin America

Tuesday, July 13, 2021
9:00 a.m.–5:00 p.m. PDT

If your organization is interested in boosting and developing key skills in AI, accelerated data science, or accelerated computing, you can request instructor-led training from the NVIDIA DLI.

Request a Workshop

Continue Your Learning with These DLI Trainings

FUNDAMENTALS

POPULAR

Fundamentals of Accelerated Computing with CUDA Python

8 hours | CUDA, Python, Numba, NumPy

Certificate Available

FUNDAMENTALS

Fundamentals of Accelerated Data Science

8 hours | RAPIDS, NumPy, XGBoost, DBSCAN, K-Means, SSSP, Python

Certificate Available

FUNDAMENTALS

High-Performance Computing with Containers

2 hours | $30 | Docker, Singularity, HPCCM, C/C++

Instructor-Led Workshop
Accelerating Data Engineering Pipelines

Learning Objectives

Workshop Outline

Workshop Details

Upcoming Workshops

Mon, Apr 12, 2021

Tue, Apr 13, 2021

Tue, Jun 22, 2021

Tue, Jul 27, 2021

Upcoming Public Workshops

Europe / Middle East / Africa

North America / Latin America

Continue Your Learning with These DLI Trainings

Fundamentals of Accelerated Computing with CUDA Python

Fundamentals of Accelerated Data Science

High-Performance Computing with Containers

Questions?

Read our FAQs.

Inquire about NVIDIA Deep Learning Institute services.

For technical questions, check out the NVIDIA Developer Forums.

Instructor-Led Workshop Accelerating Data Engineering Pipelines

Learning Objectives

Workshop Outline

Workshop Details

Upcoming Workshops

Mon, Apr 12, 2021

Tue, Apr 13, 2021

Tue, Jun 22, 2021

Tue, Jul 27, 2021

Upcoming Public Workshops

Europe / Middle East / Africa

North America / Latin America

Continue Your Learning with These DLI Trainings

Fundamentals of Accelerated Computing with CUDA Python

Fundamentals of Accelerated Data Science

High-Performance Computing with Containers

Questions?

Read our FAQs.

Inquire about NVIDIA Deep Learning Institute services.

For technical questions, check out the NVIDIA Developer Forums.

Instructor-Led Workshop
Accelerating Data Engineering Pipelines