dask-distributed-summit-2021-logo

NVIDIA at Dask Distributed Summit

Wednesday, May 19–Friday, May 21, 2021

Dask is an important component of the accelerated data science ecosystem. By pairing Dask with RAPIDS™, data scientists can scale out to multi-node, multi-GPU clusters, creating a large-scale, enterprise-grade solution to generate valuable insights and make the most out of data. Join us at the inaugural Dask Distributed Summit to explore this powerful solution and other advancements in accelerated Data Science.

What Is the Dask Distributed Summit?

The Dask Distributed Summit is a three-day virtual event comprising talks, tutorials, and discussions to facilitate a better understanding of Dask, its capabilities, and its uses. The Summit welcomes attendees with a wide variety of experiences, expertises, and backgrounds to create a diverse and compelling learning opportunity. Users, contributors, and newcomers can share experiences and learn from one another to solve hard problems and grow together. To learn more, visit summit.dask.org.

NVIDIA Talks and Tutorials

dask-2021-headshots-jake-schmitt_vibhu-jawa_devin-robison_peter-entschev@2x

Workshop: Using GPUs to Accelerate Data Science with Dask + RAPIDS

Thursday, May 20, 12:00 p.m. ET / 9:00 a.m. PT

Hear from NVIDIA practitioners and Dask contributors about how Dask + RAPIDS is supercharging data science on NVIDIA accelerated compute. In this workshop, the team will discuss how Dask + RAPIDS empower practitioners, how to start with these tools quickly, and how they’re used to solve common challenges. Attendees of all skill levels and familiarity welcome!

NVIDIA Speakers:  Jake Schmitt (Product Marketing Manager), Vibhu Jawa (Data Scientist) , Devin Robison (Data Scientist), Peter Entschev (Senior System Software Engineer)

Tutorial: Bring Dask Workloads to GPUs with RAPIDS

Thursday, May 20, 2:30 p.m. ET / 11:30 a.m. PT

Learn how to use GPUs to power your data science workloads with Ben Zaitlen and Nick Becker. This tutorial introduces RAPIDS and illustrates how to use Dask + RAPIDS to accelerate extract, transform, load (ETL) and machine learning workloads, increasing performance and decreasing total cost. RAPIDS and Dask both follow the standards set by the PyData ecosystem, so anyone familiar with pandas, scikit-learn, etc., will find this tutorial helpful.

NVIDIA Speakers: Benjamin Zaitlen (Software Manager) and Nick Becker (Senior Software Engineer)

dask-2021-headshots-benjamin-zaitlen_nick-becker@2x
dask-2021-headshot-rick-zamora@2x

Talk: NVTabular—Building a Dask-Based Library for Recommender System Data Pipelines

Wednesday, May 19, 3:30 p.m. ET / 12:30 p.m. PT

NVIDIA is building powerful tools on top of Dask + RAPIDS. Join Rick Zamora and learn about NVTabular, a recommender system-focused feature engineering and preprocessing library for tabular data. This talk will describe how NVTabular was built entirely on Dask-DataFrame to both simplify and accelerate model-training pipelines.

NVIDIA Speaker: Rick Zamora (Senior Systems Software Engineer)

Talk: GPU-Accelerated Streaming at Scale Using Dask

Friday, May 21, 12:30 p.m. ET / 9:30 a.m. PT

Hear from Chinmay Chandak and Jarod Maupin about NVIDIA’s strides in large-scale stream processing. cuSreamz, built upon Dask + RAPIDS, provides a reliable, cost-effective streaming solution and is used at NVIDIA for intensive operations. This talk will be about how NVIDIA is leveraging Dask to GPU-accelerate big data stream processing at scale in production.

NVIDIA Speakers: Chinmay Chandak (Software Engineer) and Jarod Maupin (Software Engineer)

dask-2021-headshots-chinmay-chandak_jarod-maupin@2x
dask-2021-headshot-mads-kristensen@2x

Talk: An Introduction to Memory Spilling

Wednesday, May 19, 12:00 p.m. ET / 9:00 a.m. PT

Memory spilling is an important feature that makes it possible to run Dask applications that would otherwise run out of memory. When low on memory, Dask moves data from GPU memory to main memory and/or data from main memory to disk automatically. Listen to Mads R. B. Kristensen’s talk to learn how spilling works, its shortcomings, and about a new Dask-CUDA® approach to overcome these shortcomings.

NVIDIA Speaker: Mads R. B. Kristensen  (Senior Software Engineer)

Our Dask Community Talks

dask-2021-headshots-benjamin-zaitlen_john-kirkham_mads-kristensen_rick-zamora@2x

Workshop: Doing Nothing Poorly—Accelerating Dask Scheduling

Friday, May 21, 2:00 p.m. ET / 11:00 a.m. PT

Listen in on a highly collaborative workshop about recent updates to the Dask Scheduler. This workshop will discuss scheduler internals, motivating problems where scaling is a problem, and how the Dask community is moving forward to improve performance.

NVIDIA Speakers:  Benjamin Zaitlen, John Kirkham (Senior Systems Software Engineer), Mads R. B. Kristensen, and Rick Zamora (Senior Systems Software Engineer)

Workshop: Deploying Dask

Wednesday, May 19, 12:00 p.m. ET / 9:00 a.m. PT

Join NVIDIA’s Jacob Tomlinson and other Dask contributors to learn about the most common methods for deploying Dask today. After an overview of all the moving pieces within a Dask cluster (client, cluster, scheduler, workers), they’ll talk through various platforms and the tools used to deploy Dask on to them, along with benefits, common challenges, and pitfalls.

NVIDIA Speaker: Jacob Tomlinson (Senior Software Engineer)

dask-2021-headshot-jacob-tomlinson@2x
dask-2021-headshot-rick-zamora@2x

Workshop: High-Performance Data Access for Dask

Friday, May 21, 10:00 a.m. ET / 7:00 a.m. PT

Dask contains many functions for data input/output (IO) for arrays and dataframes. In this workshop, listen to Rick Zamora and other Dask contributors as they discuss the current status of various data format integrations for Dask and more generally about the parallel/cloud-friendly data storage landscape.

NVIDIA Speaker: Rick Zamora (Senior Systems Software Engineer)

Talks from Our Partners

Listen to talks from NVIDIA Partners Using Dask and RAPIDS

google-cloud-logo

Scale Model Training in Minutes with Dask + RAPIDS on GCP AI Platform

Friday, May 21, 2:30 p.m. ET / 11:30 a.m. PT

Dask allows users to scale their Python code. However, it’s not usually easy to provision machines necessary to run that code. Google Cloud’s AI Platform provides a simplified experience for provisioning machines and clusters, making it easy to get started with Dask + RAPIDS quickly. The talk will show examples of using Google Cloud's AI platform to run a variety of jobs using Dask + RAPIDS on a variety of machine types.

walmart-global-tech-logo

Clusters of Clusters - Using Dask Distributed to Scale Enterprise Machine Learning Systems

Wednesday, May 19, 9:00 a.m. ET / 6:00 a.m. PT

Learn how Walmart uses Dask and other powerful open source tools to enable their Data Science community to operate at scale. The past decade has shown there is a steep learning curve for organizations trying to scale and productionalize ML systems quickly. Walmart has developed several principles over the years to address this challenge. In this talk, Grant Gelven will discuss these principles and the open-source tools that enable Walmart today.

capital-one-logo

Capital One Uses Dask!

Friday, May 21, 3:00 p.m. ET / 12:00 p.m. PT

Capital One processes tremendous volumes of data every day to drive everything from credit decisions to fraud detection to call transcription. By enabling distributed computing in the PyData ecosystem, Dask makes it easier to handle more data in less time, resulting in more experiments, faster and better decisions, and expedient product delivery to their customers. The rich ecosystem of libraries, including Dask-ML, RAPIDS, XGBoost, internal libraries, and more make this possible.

plotly-logo

Architectures for Scalable Analytic Dashboards in Python with Dask Distributed and Dash

Thursday, May 20, 12:00 p.m. ET / 9:00 a.m. PT

Plotly-Dash is a framework for developing analytic web apps in Python. This talk will describe Dash’s design and how Plotly-Dash enables efficient scaling to support large numbers of simultaneous users. Several architectures will be presented that can be used to combine the strengths of Dash with the strengths of Dask Distributed + RAPIDS to create apps that scale to support large datasets and many users.

Come learn more about NVIDIA’s work in accelerated data science and the Dask community at the 2021 Dask Summit. The program schedule highlights a variety of excellent talks that a broad audience will find exciting.

LIKE NO PLACE
YOU’VE EVER WORKED

You’ll solve some of the world’s hardest problems and discover never-before-seen ways to improve the quality of life for people everywhere. From healthcare to robots. Self-driving cars to blockbuster movies. And a growing list of new opportunities every single day. Explore all of our open roles, including internships and new college graduate positions.