dask-distributed-summit-2021-logo

NVIDIA at Dask Distributed Summit

Wednesday, May 19–Friday, May 21, 2021

Dask is an important component of the accelerated data science ecosystem. By pairing Dask with RAPIDS™, data scientists can scale out to multi-node, multi-GPU clusters, creating a large-scale, enterprise-grade solution to generate valuable insights and make the most out of data. Check out NVIDIA’s presentations at the inaugural Dask Distributed Summit to explore this powerful set of solutions and other advancements in accelerated data science.

What Is the Dask Distributed Summit?

The Dask Distributed Summit is an annual virtual event comprising talks, tutorials, and discussions to facilitate a better understanding of Dask, its capabilities, and its uses. The Summit welcomes attendees with a wide variety of experiences, expertises, and backgrounds to create a diverse and compelling learning opportunity. Users, contributors, and newcomers can share experiences and learn from one another to solve hard problems and grow together. To learn more, visit
summit.dask.org.

NVIDIA Talks and Tutorials

NVIDIA Speakers: Jake Schmitt, Vibhu Jawa, Devin Robison, and Peter Entschev

Workshop: Using GPUs to Accelerate Data Science with Dask + RAPIDS

Listen to NVIDIA practitioners and Dask contributors about how Dask + RAPIDS is supercharging data science on NVIDIA accelerated compute. In this workshop, the team discusses how Dask + RAPIDS empower practitioners, how to start with these tools quickly, and how they’re used to solve common challenges. Attendees of all skill levels and familiarity are welcome!

NVIDIA Speakers:  Jake Schmitt (Product Marketing Manager), Vibhu Jawa (Data Scientist) , Devin Robison (Data Scientist), Peter Entschev (Senior System Software Engineer)

Tutorial: Bring Dask Workloads to GPUs with RAPIDS

Learn how to use GPUs to power your data science workloads with Ben Zaitlen and Nick Becker. This tutorial introduces RAPIDS and illustrates how to use Dask + RAPIDS to accelerate extract, transform, load (ETL) and machine learning workloads, increasing performance and decreasing total cost. RAPIDS and Dask both follow the standards set by the PyData ecosystem, so anyone familiar with pandas, scikit-learn, etc., will find this tutorial helpful.

NVIDIA Speakers: Benjamin Zaitlen (Software Manager) and Nick Becker (Senior Software Engineer)

NVIDIA Speakers Benjamin Zaitlen and Nick Becker
NVIDIA Speaker Rick Zamora

Talk: NVTabular—Building a Dask-Based Library for Recommender System Data Pipelines

NVIDIA builds powerful tools on top of Dask + RAPIDS. Listen to Rick Zamora and learn about NVTabular, a recommender system-focused feature engineering and preprocessing library for tabular data. This talk describes how NVTabular was built entirely on Dask-DataFrame to both simplify and accelerate model-training pipelines.

NVIDIA Speaker: Rick Zamora (Senior Systems Software Engineer)

Talk: GPU-Accelerated Streaming at Scale Using Dask

Hear from Chinmay Chandak and Jarod Maupin about NVIDIA’s strides in large-scale stream processing. cuSreamz, built upon Dask + RAPIDS, provides a reliable, cost-effective streaming solution and is used at NVIDIA for intensive operations. This talk discusses how NVIDIA is leveraging Dask to GPU-accelerate big data stream processing at scale in production.

NVIDIA Speakers: Chinmay Chandak (Software Engineer) and Jarod Maupin (Software Engineer)

NVIDIA Speakers Chinmay Chandak and Jarod Maupin
NVIDIA Speaker Mads R. B. Kristensen

Talk: An Introduction to Memory Spilling

Memory spilling is an important feature that makes it possible to run Dask applications that would otherwise run out of memory. When low on memory, Dask moves data from GPU memory to main memory and/or data from main memory to disk automatically. Listen to Mads R. B. Kristensen’s talk to learn how spilling works, its shortcomings, and about a new Dask-CUDA® approach to overcome these shortcomings.

NVIDIA Speaker: Mads R. B. Kristensen  (Senior Software Engineer)

Our Dask Community Talks

NVIDIA Speakers Benjamin Zaitlen, John Kirkham, Mads R. B. Kristensen, and Rick Zamora

Workshop: Doing Nothing Poorly—Accelerating Dask Scheduling

Check out a highly collaborative workshop about recent updates to the Dask Scheduler. This workshop covers scheduler internals, motivating problems where scaling is a problem, and how the Dask community is moving forward to improve performance.

NVIDIA Speakers:  Benjamin Zaitlen, John Kirkham (Senior Systems Software Engineer), Mads R. B. Kristensen, and Rick Zamora (Senior Systems Software Engineer)

Workshop: Deploying Dask

Listen to NVIDIA’s Jacob Tomlinson and other Dask contributors to learn about the most common methods for deploying Dask today. After an overview of all the moving pieces within a Dask cluster (client, cluster, scheduler, workers), they talk through various platforms and the tools used to deploy Dask on to them, along with benefits, common challenges, and pitfalls.

NVIDIA Speaker: Jacob Tomlinson (Senior Software Engineer)

NVIDIA Speaker Jacob Tomlinson
NVIDIA Speaker Rick Zamora

Workshop: High-Performance Data Access for Dask

Dask contains many functions for data input/output (IO) for arrays and dataframes. In this workshop, listen to Rick Zamora and other Dask contributors as they discuss the current status of various data format integrations for Dask and more generally about the parallel/cloud-friendly data storage landscape.

NVIDIA Speaker: Rick Zamora (Senior Systems Software Engineer)

Talks from Our Partners

Listen to talks from NVIDIA Partners Using Dask and RAPIDS

Google Cloud Logo

Scale Model Training in Minutes with Dask + RAPIDS on GCP AI Platform

Dask allows users to scale their Python code. However, it’s not usually easy to provision machines necessary to run that code. Google Cloud’s AI Platform provides a simplified experience for provisioning machines and clusters, making it easy to get started with Dask + RAPIDS quickly. The talk will show examples of using Google Cloud's AI platform to run a variety of jobs using Dask + RAPIDS on a variety of machine types.

Walmart Global Tech Logo

Clusters of Clusters - Using Dask Distributed to Scale Enterprise Machine Learning Systems

Learn how Walmart uses Dask and other powerful open source tools to enable their Data Science community to operate at scale. The past decade has shown there is a steep learning curve for organizations trying to scale and productionalize ML systems quickly. Walmart has developed several principles over the years to address this challenge. In this talk, Grant Gelven will discuss these principles and the open-source tools that enable Walmart today.

Capital One Logo

Capital One Uses Dask!

Capital One processes tremendous volumes of data every day to drive everything from credit decisions to fraud detection to call transcription. By enabling distributed computing in the PyData ecosystem, Dask makes it easier to handle more data in less time, resulting in more experiments, faster and better decisions, and expedient product delivery to their customers. The rich ecosystem of libraries, including Dask-ML, RAPIDS, XGBoost, internal libraries, and more make this possible.

Plotly Logo

Architectures for Scalable Analytic Dashboards in Python with Dask Distributed and Dash

Plotly-Dash is a framework for developing analytic web apps in Python. This talk will describe Dash’s design and how Plotly-Dash enables efficient scaling to support large numbers of simultaneous users. Several architectures will be presented that can be used to combine the strengths of Dash with the strengths of Dask Distributed + RAPIDS to create apps that scale to support large datasets and many users.

Check out how RAPIDS, Dask, and NVIDIA compute are laying the groundwork for the future of data science. Learn more about NVIDIA’s work in accelerated data science and the Dask community from the 2021 Dask Summit.

LIKE NO PLACE
YOU’VE EVER WORKED

You’ll solve some of the world’s hardest problems and discover never-before-seen ways to improve the quality of life for people everywhere. From healthcare to robots. Self-driving cars to blockbuster movies. And a growing list of new opportunities every single day. Explore all of our open roles, including internships and new college graduate positions.