SCHEDULE

Catch up on SC19 talks featuring a broad range of disciplines such as weather forecasting, energy exploration, and molecular dynamics.

Monday 11/18

TUESDAY 11/19

  • 10:00 AM - 10:25 AM

    Driving Purdue University's Next Giant Leap in HPC and Data Science


    Preston Smith, Executive Director, Research Computing, Purdue University
    WATCH NOW
    VIEW PDF
    Data Analytics
       
    •  
       
      Researchers at Purdue University utilize GPUs for accelerating science for modeling, simulation, and for the university's data science initiative. In this presentation, Executive Director for Research Computing Preston Smith will discuss how GPU technology supports Purdue researchers taking the next Giant Leap in HPC and data science.
  •  
  • 10:30 AM - 10:55 AM

    NVIDIA Deep Learning Institute and University Ambassador Program


    Joe Bungo, Deep Learning Institute Program Manager, NVIDIA
    Jian Tao, TEES Research Scientist / Computational Scientist / Adjunct Professor, Texas A&M University
    WATCH NOW
    VIEW PDF
    AI / Machine Learning
       
    •  
       
      The NVIDIA Deep Learning Institute (DLI) offers hands-on training in AI and accelerated computing to solve real-world problems. Developers, data scientists, researchers, and students can get practical experience powered by GPUs in the cloud and earn a certificate of competency to support professional growth. DLI offers self-paced, online training for individuals, instructor-led workshops for teams, and downloadable course materials for university educators. The DLI University Ambassador Program enables qualified educators to teach DLI workshops at university campuses and academic conferences to faculty, students, and researchers at no cost, complementing the traditional theoretical approaches to university education in machine learning, data science, AI, and parallel computing.
  •  
  • 11:00 AM - 11:25 AM

    Making your Supercomputer Smarter with Mellanox BlueField


    Gil Bloch, Principal Architect, Mellanox Technologies
       
    •  
       
      On the road to Exa-scale supercomputers, there is a need to solve new challenges. This requires new system architectures moving from homogenous CPU centered systems to heterogeneous systems. New processing engines such as GPUs enables greater processing power, and at the same time the network becomes a more important part of the system. With a better co-design, the network is required to perform smarter and more complex operations beyond the traditional data movement. In this talk we will present how BlueField smart networking devices can change the boundaries between CPU and network and between software and hardware to enable greater scalability and performance for your supercomputer.
  •  
  • 11:30 AM - 11:55 AM

    Supercharging Quantum Chemistry with Deep Learning


    Adrian Roitberg, Full Professor, Department of Chemistry, University of Florida
    WATCH NOW
    Software / Developers
       
    •  
       
      In modeling chemical spaces for drug design, catalysis, etc, one is always struggling with the fact that very accurate calculations, based on quantum chemistry, have a very high computational cost. Cheaper approximations to true quantum chemistry are less accurate, so there are less useful. We show that a deep learning algorithm, running on NVIDIA GPUS, can be trained to a large database of quantum energies for small molecules, and the resulting networks are at the same time highly accurate and extremely fast, with speedups versus quantum chemistry up to 10^8. This breakthrough opens the possibility of modeling processes such as drug binding and reaction modeling at a cost and accuracy previously thought impossible to achieve.
  •  
  • 12:00 PM - 12:25 PM

    High Performance Deep Learning Clusters


    Julie Bernauer, Director, Deep Learning Systems Engineering, NVIDIA
    WATCH NOW
    VIEW PDF
    AI / Machine Learning
       
    •  
       
      The largest supercomputers in the world are now designed with AI in mind, and enterprise and AI research systems are being designed more like supercomputers. Scaling is a core part of modern AI frameworks and methodologies. Designing and running a compute infrastructure that provides maximum performance with constantly changing software stacks at a limited cost for scale-out is a challenge. While single GPU and single machine performance is now easy to attain, larger scale systems on the order of 1500+ GPUs require focusing on large numbers of putative bottlenecks at the design stage and also in everyday operations. These issues are usually ignored by users and software developers and therefore have to be made as transparent and efficient as possible: interconnect design, filesystem performance, power balancing, thermal control, job scheduling, software for management, and software versatility in a stable and reliable, yet high performance, environment have to be addressed. In this talk, we will showcase the Superpod as an example of rapid time to floor for an AI performance infrastructure, and how modern AI frameworks and models are built to leverage these systems.
  •  
  • 12:30 PM - 12:55 PM
    WATCH NOW
    VIEW PDF
    Accelerated Computing
       
    •  
       
      The IceCube Neutrino Observatory is the National Science Foundations (NSF)’s premier facility to detect neutrinos with energies above approximately 10 GeV and a pillar for NSF’s Multi-Messenger Astrophysics (MMA) program, one of NSF’s 10 Big Ideas. The detector is located at the geographic South Pole and is designed to detect interactions of neutrinos of astrophysical origin by instrumenting over a gigaton of polar ice with 5160 optical sensors. The sensors are buried between 1450 and 2450 meters below the surface of the South Pole ice sheet. To understand the impact of ice properties on the incoming neutrino detection, and origin, photon propagation simulations on GPUs are used. We report on a few hour GPU burst across Amazon Web Services, Microsoft Azure, and Google Cloud Platform that harvested all available for sale GPUs across the three cloud providers the weekend before SC19. GPU types span the full range of generations from NVIDIA GRID K520 to the most modern NVIDIA T4 and V100. We report the scale and science performance achieved across all the various GPU types, as well as the science motivation to do so.
  •  
  • 1:00 PM - 1:25 PM
    WATCH NOW
    AI / Machine Learning
       
    •  
       
      Merlin is a workflow framework that enables orchestrating multi-machine, multi-batch, large-scale science simulation ensembles with in-situ postprocessing, deep learning at-scale using LBANN, surrogate model driven sampling and data exploration. This session describes how MERLIN was used to combine hydrodynamics simulations for ICF with LBANN for training and iterative feedback.
  •  
  • 1:30 PM - 1:55 PM
    WATCH NOW
    VIEW PDF
    Software / Developers
       
    •  
       
      Learn the best way to introduce Tensor Core acceleration in HPC applications, followed by quick introduction on Tensor Core architecture and functionality. This session will also present case studies of HPC applications using Tensor Core.
  •  
  • 2:00 PM - 2:25 PM

    Exascale Biology: From Genome to Climate with a Few Steps along the Way


    Dan Jacobson, Chief Scientist for Computational Systems Biology, ORNL
    WATCH NOW
    Data Analytics
       
    •  
       
      Supercomputing is playing a key role in our efforts to understand complex biological systems. To date we have performed calculations on the Summit supercomputer at OLCF with two different algorithms achieving 2.41 exaflops and 2.32 exaflops of mixed precision performance. The larger of these calculations required 22 Zeta-floating point-operations to achieve. The cost of generating biological data is dropping exponentially, resulting in increased data that has far outstripped the predictive growth in computational power from Moore’s Law. This flood of data has opened a new era of systems biology in which there are unprecedented opportunities to gain insights into complex biological systems. Integrated biological models need to capture the higher order complexity of the interactions among cellular components. Solving such complex combinatorial problems will give us extraordinary levels of understanding of biological systems. Paradoxically, understanding higher order sets of relationships among biological objects leads to a combinatorial explosion in the search space of biological data. These exponentially increasing volumes of data, combined with the desire to model more and more sophisticated sets of relationships within a cell, across an organism and up to ecosystems and, in fact, climatological scales, have led to a need for computational resources and sophisticated algorithms that can make use of such datasets. The result is a comprehensive systems biology model of an organism and how it has adapted to and responds to its abiotic and biotic environment which has applications in bioenergy, precision agriculture, and ecosystem studies among other disciplines.
  •  
  • 2:30 PM - 2:55 PM

    Porting VASP to GPUs using OpenACC


    Martijn Marsman, Sr. Scientist, University Vienna
       
    •  
       
      A new and expanded GPU port of the Vienna Ab initio Simulation Package (VASP) for atomic scale materials modeling is now available. VASP is one of the most widely used codes for electronic-structure calculations and first-principles molecular dynamics. The blocked Davidson algorithm (including exact exchange) and RMM-DIIS for the real-space projection scheme were previously ported to CUDA C with good speed-ups on GPUs, but also with an increase in the maintenance workload because VASP is otherwise written entirely in Fortran. The new approach using OpenACC directives combined with calling NVIDIA libraries allowed us to extend GPU acceleration to the reciprocal projection and to direct minimizers with higher productivity and increased maintainability of the code because we can focus on a unified code base for the CPU and GPU version of VASP. It also allowed us to offer first-day GPU acceleration for features newly introduced into VASP, including adaptively compressed exchange (ACE) and a double-buffering implementation for hybrid DFT calculations. The performance relative to the CUDA port and CPU version will be discussed, as will the strong scaling with multiple GPUs. The performance we are able to deliver and the vastly decreased maintenance effort has led the VASP group to adopt OpenACC as the programming model for all future GPU porting of VASP.
  •  
    •  
    •  
       
      We’ll share early experiences using the PGI Fortran and C++ compilers on Arm and Rome processor-based systems with V100 GPUs. The PGI compilers support GPU Acceleration using OpenACC and CUDA. Support for mapping standard Fortran and C++ parallelism to GPUs and CPUs is coming soon. We’ll give an update on these PGI compiler features and more that simplify your on-ramp to GPU computing and maximize portability of your GPU-accelerated applications across all major CPU families.
  •  
  • 3:30 PM - 3:55 PM

    Accelerating Supercomputing with Arm and NVIDIA


    David Lecomber, Senior Director, Infrastructure and HPC tools, Arm
    WATCH NOW
    VIEW PDF
    Software / Developers
       
    •  
       
      The announcement of the support of CUDA in the Arm ecosystem is opening up a new path forward for energy efficient AI-enabled supercomputing. NVIDIA and Arm are working together to provide the complete software stack for AI and HPC – and Arm’s compilers and tools for HPC developers will be available on this platform. We will demonstrate how to migrate applications to Arm + CUDA including tuning and optimizing the results with Arm Allinea Studio.
  •  
  • 4:00 PM - 4:25 PM

    Multidisciplinary AI Today and Tomorrow with Bridges-AI and Bridges-2


    Nick Nystrom, Chief Scientist, Pittsburgh Supercomputing Center / Carnegie Mellon University
    Paola Buitrago, Director, AI & Big Data, Pittsburgh Supercomputing Center / Carnegie Mellon University
    WATCH NOW
    AI / Machine Learning
       
    •  
       
      Research is evolving to be even more data-centric. AI is driving this change and is increasingly enabling breakthroughs. For analyzing large data, AI is helping to spot correlations and identify anomalies. For simulation and modeling, AI is reducing time to solution by orders of magnitude by replacing expensive computation with fast inferencing. This talk describes two unique platforms at the Pittsburgh Supercomputing Center that combine AI and HPC, at no cost for research and education. Bridges-AI, available today and an AI-focused extension to the Bridges supercomputer, features an NVIDIA DGX-2 and HPE Apollo 6500 servers, with 88 Volta GPUs total. Bridges-2 will build on Bridges and Bridges-AI to serve AI and AI-enabled simulation of tomorrow. To illustrate the systems’ impact, we will detail use cases in genomics and medical imaging, weather forecasting, agricultural sustainability, and other fields. Learn what’s possible, how to get access, and of opportunities for collaboration.
  •  
  • 4:30 PM - 4:55 PM
    WATCH NOW
    VIEW PDF
    Visualization
       
    •  
       
      We are working with NVIDIA to lower the barrier of scientific understanding by improving the communication tools that scientists will have access to. NVIDIA has been working with Kitware to not only bring NVIDIA RTX support to ParaView, but allow ParaView users to access the omniverse. Come see how advancements in ParaView will unlock the next generation of visualization communication/collaboration techniques for your science.
  •  
  • 5:00 PM - 5:25 PM

    CUDA Developer Tools: New Features and Capabilities


    Rafael Campana, Software Engineering Director, Compute Developer Tools, NVIDIA
    WATCH NOW
    VIEW PDF
    Software / Developers
       
    •  
       
      Come and learn about the wide range of Developer Tools that enables you to harness even more power out of the NVIDIA GPUs, and the latest new capabilities.
  •  
  • 5:30 PM - 5:55 PM
    WATCH NOW
    VIEW PDF
    Data Analytics
       
    •  
       
      Discover how Open OnDemand (OOD) can help lower the barrier to entry and ease access to computing resources for both new and existing users of HPC, big data, and analytics. OOD is an NSF-funded open-source HPC portal whose goal is to provide an easy way for system administrators to provide web access to their HPC resources and is in use at dozens of HPC centers. This presentation will touch upon the capabilities and architecture of OOD, installation experiences, priority of upcoming features such as customized workflows, training users, integration with other science gateways, and growing the community. Special emphasis will be on the joint efforts between NVIDIA engineers and the OOD project team to provide GPU specific metrics, accessibility and workflows to facilitate utilization of GPUs in HPC environments.
  •  

WEDNESDAY 11/20

  • 10:00 AM - 10:25 AM

    Petascale Reconstruction of Neural Connectivity


    Manuel Castro, Software Engineer, Princeton University
    WATCH NOW
    VIEW PDF
    AI / Machine Learning
       
    •  
       
      Connectomics is the study of how neurons are connected to each other in the brain. The neural connectivity of a brain can be thought of as a circuit; connectomics aims to discover and understand this circuit. Brain tissue must be imaged at very high levels of resolution to capture the relevant cellular structures. At a characteristic resolution of 4x4x40nm, a cubic millimeter of tissue consists of 1.56 x 10^15 voxels (3D pixels). Therefore, connectomics requires the gathering and processing of vast amounts of data A volume of brain tissue is typically processed by slicing it and imaging the slices with an electron microscope. After this, its connectivity can be reconstructed by determining which voxels belong to the same neuron, and detecting which neurons make synapses onto each other. At the SeungLab, we use convolutional neural networks, amongst other algorithms, to perform these tasks. Our largest reconstruction thus far is a petabyte-scale dataset, which we processed by distributing work across thousands of nodes and GPUs in the cloud. This process took approximately 1.5 months and results in a reconstruction with on the order of 10^5 neurons and 10^9 synapses. Manuel will present an overview of the computational reconstruction pipeline, with an emphasis on use of distributed computing and storage.
  •  
  • 10:30 AM - 10:55 AM

    Perlmutter - A 2020 Pre-Exascale GPU-accelerated System for NERSC. Architecture and Early Application Performance Optimization Results


    Nicholas J. Wright, Perlmutter Chief Architect, Lawrence Berkeley National Laboratory
    Jack Deslippe, Application Performance Group Lead, NERSC
    WATCH NOW
    VIEW PDF
    Software / Developers
       
    •  
       
      The Perlmutter machine will be delivered to NERSC/LBNL in 2020 and contain a mixture of CPU-only and NVIDIA Tesla GPU-accelerated nodes. In this talk we will describe the analysis we performed in order to optimize this design to meet the needs of the broad NERSC workload. We will also discuss early results from our application readiness program, the NERSC Exascale Science Applications Program (NESAP), where we are working with our users to optimize their applications in order to maximize their performance on GPU's in Perlmutter.
  •  
  • 11:00 AM - 11:25 AM

    Progress toward an Earth System Model with Machine Learning Components


    Dr. Richard D. Loft, Director, Technology Development Division Computational and Information Systems Laboratory, National Center for Atmospheric Research
    WATCH NOW
    VIEW PDF
    AI / Machine Learning
       
    •  
       
      Many have speculated that combining GPU computational power with machine learning algorithms could radically improve weather and climate modeling. This talk will discuss an experimental project centered on the Model for Prediction Across Scales-Atmosphere (MPAS-A) to evaluate this program’s prospects of success. Initially, the project set out to determine whether CPU-GPU performance portability could be attained in a single MPAS-A source code by applying OpenACC directives. The initial porting project is nearing completion, and is showing scalability and throughput performance competitive with other the state-of-the-art models. At the same time, machine learning scientists at NCAR and elsewhere began looking at the piecemeal replacement of atmospheric parameterizations with machine-learning emulators. This talk will present results from efforts at NCAR to apply machine learning to emulate the atmospheric surface layer and cloud microphysics parametizations. The talk will also discuss related efforts to tackle radiative transport and other physics components, and will conclude with our own future plans to emulate the complex chemistry of aerosol formation.
  •  
  • 11:30 AM - 11:55 AM
    Simulation
       
    •  
       
      Reservoir simulation is an important component in the recovery of oil and gas from subsurface reservoirs. Reservoir engineers use simulators to understand geology, quantify uncertainty and then optimize production strategy. Typically simulations run for hours to days and many simulations are needed to generate reliable forecasting of oil and gas production. As such performance is key. INTERSECT is a highly advanced reservoir simulator and mature product which has been widely deployed by large number of clients worldwide. In this presentation we discuss how we accelerated INTERSECT with GPUs based on NVIDIA’s AMGX library. We discuss the challenges of integrating GPUs in an existing source code and the various design choices and trade-offs that were made along the way. Furthermore, we compare the performance of INTERSECT with and without GPU acceleration.
  •  
  • 12:00 PM - 12:25 PM

    GPUDirect Storage: Transfer Data Directly to GPU Memory, Alleviating IO Bottlenecks


    CJ Newburn, Principal Architect for HPC, NVIDIA Compute Software, NVIDIA
       
    •  
       
      GPUDirect Storage is a new technology that enables a direct data path between storage devices and the GPU. Eliminating unnecessary memory copies through the CPU, boosts bandwidth, lowers latency, and reduced CPU and GPU overhead. It is the easiest way to scale performance when IO to the GPU is a bottleneck. In this talk we’ll explain the technology its benefits and explain the end to end use cases. We will also introduce distributed file systems partners supporting GPUDirect Storage.
  •  
  • 12:30 PM - 12:55 PM

    Beyond the CPU: Is Accelerated Computing for Everyone?


    Jack Wells, Director of Science, Oak Ridge Leadership Computing Facility (OLCF)
    WATCH NOW
    VIEW PDF
    Accelerated Computing
       
    •  
       
      At the brink of Exascale it’s clear that massive parallelism at the node level is the path forward. Scientists and engineers need highly productive programming environments to speed their time to discovery on today’s HPC systems. In addition to the requirements this puts on compilers and software development tools, researchers must shore up their skills in parallel and accelerated computing in order to be ready for the Exascale era. Join Jack Wells, Director of Science at the ORNL Leadership Computing Facility and Vice President of the OpenACC Organization, as he discusses plans to help the HPC developer community take advantage of today’s fastest supercomputers and prepare for Exascale through hands-on training and education in state-of-the art programming techniques in 2020 and beyond. Jack will give an overview of how the OpenACC organization mission is expanding to meet these needs and building on its philosophy of a user-driven OpenACC specification to create a bridge to heterogeneous programming using parallel features in standard C++ and Fortran.
  •  
  • 1:00 PM - 1:25 PM

    Simplifying AI, HPC, and Visualization Workflows with GPU-Optimized Containers from NGC


    Scott McMillan, Senior Solutions Architect, NVIDIA
    Chintan Patel, Sr. Manager, Product Marketing, NVIDIA
    WATCH NOW
    VIEW PDF
    Applications / Containers
       
    •  
       
      NGC is a container registry of GPU-optimized software for AI frameworks, HPC applications, and scientific visualization tools that eliminate complex application installations and provide easy access to the latest versions of the software. Simply pull and run the applications on Docker or Singularity. We will discuss the expansion of NGC with new containers, support for ARM, pre-trained AI models, and deployment tools that simplify the use of NGC offering in HPC environments.
  •  
  • 1:30 PM - 1:55 PM

    How Language Transforms HPC: Julia and GPUs


    Alan Edelman, Professor of Mathematics, MIT, Computer Science & AI Lab, MIT, Chief Scientist, Julia Computing
    WATCH NOW
    VIEW PDF
    Software / Developers
       
    •  
       
      Julia, through its abstractions, makes reuse of code possible at a high level for CPUs and GPUs. We will demonstrate, through simple examples, how Julia makes general purpose coding of GPUs possible.
  •  
  • 2:00 PM - 2:25 PM

    Landing on Mars: Petascale Unstructured-Grid CFD Simulations on Summit


    Eric Nielsen, Senior Research Scientist, NASA Langley Research Center
       
    •  
       
      We will present a campaign to investigate the use of supersonic retropropulsion as a means to land payloads on Mars large enough to enable human exploration. Simulations are performed on the world’s largest supercomputer, Summit, located at Oak Ridge National Laboratory. The engineering and computational challenges associated with retropropulsion aerodynamics and the need for large-scale resources like Summit are reviewed. For these simulations, a GPU implementation of NASA Langley Research Center's FUN3D flow solver is used. The development history, performance, and scalability are compared with those of contemporary HPC architectures. The use of an optimized GPU-accelerated CFD solver on Summit has enabled simulations well beyond conventional computing paradigms.
  •  
  • 2:30 PM - 2:55 PM

    The Weather Company: High-resolution, Hyper Localized Global Weather Forecasting for the Masses


    Todd Hutchinson, Manager of Numerical Weather Prediction, The Weather Company, an IBM Business
    WATCH NOW
    VIEW PDF
    Software / Developers
       
    •  
       
      The Weather Company (TWC), an IBM Business, runs the world’s first weather prediction system - IBM GRAF (Global High-Resolution Atmospheric Forecast System) driven by MPAS - providing global, rapidly-updating weather forecasts at a very high resolution. This presentation will describe how a cluster of IBM AC922 servers with NVIDIA Volta V100 GPUs delivers weather forecasts that are run globally with a unique ability to predict events that are hyper-local like thunderstorms and updated in minutes versus hours. In order to exploit the capabilities of the high-performance cluster, MPAS was ported to run very efficiently on hundreds of interconnected CPUs and GPUs. Further, the presentation will show how output derived from the weather prediction system is used to aid business decision-making in areas such as commercial aviation, energy and agriculture.
  •  
    •  
    •  
       
      The National Energy Technology Laboratory (NETL) has been exploring the use of TensorFlow (TF) for general scientific and engineering computations within High Performance Computing (HPC) environments which might include Machine Learning (ML). For instance, NETL recently developed a novel stiff chemistry solver implemented in TF and achieved an ~300x speed up over LSODA serial and ~35x speedup over LSODA parallel. Further, NETL recently developed a TF based single-phase fluid solver and achieved ~3.1x improvement over 40 ranks of MPI on CPU (Benchmarking results will be presented at DOE’s theater in a related talk). Researchers at NETL have found that TF is an incredibly easy to use tensor algebra package that supports the highest performing hardware in the world, runs efficiently, and is easy to interface with existing HPC software. This talk will reveal the recently developed methodology NETL is using to accelerate Computational Fluid Dynamics (CFD) and add Machine Learning. NETL will discuss how to set up a computation workflow, several “gotcha” issues and how to deal with them, how to integrate ML into the workflow, how to run a TF graph from an existing application, and how to call an existing application from within a TF graph.
  •  
  • 3:30 PM - 3:55 PM
       
    •  
       
      AI techniques have been around for more than 5 decades, but only in the 2000s, have we seem neural networks have commercial use and machine learning techniques to start surpassing traditional methods in picture recognition, natural language processing and other tasks. Probably the most important piece enabling AI was the use of GPUs to train modela, enabling great speedup compared to CPUs. Running distributed machine learning on a large number of GPUs requires movement of a large amount of data between the GPUs or between GPUs and the Parameter-Server imposing heavy load on the interconnect, which now becomes the new bottleneck. Creating an efficient system for distributed machine learning requires not only the best processing engines and latest GPU model but also requires an efficient high performance interconnect technology to enable efficient utilization of the GPUs and near-linear scaling. Mellanox focuses on CPU offload technologies designed to process data as it moves through the network, either by the Host Channel Adapter of the switch. This frees up CPU and GPU cycles for computation, reduces the amount of data transferred over the network, allows for efficient pipelining of network and computation, and provides for very low communication latencies and overheads. We will present the special requirements imposed on the interconnect by the distributed machine learning applications, and describe the latest interconnect technologies allowing efficient data transfer and processing.
  •  
  • 4:00 PM - 4:25 PM
    WATCH NOW
    Accelerated Computing
       
    •  
       
      Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) applications can also harness this power. Specifically, we use the general HPC problem, Ax=b, where A is a large dense matrix, and a double precision (FP64) solution is needed for accuracy. Our approach is based on mixed-precision (FP16 and FP64) iterative refinement, and we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly tuned implementations. These new methods show how using half-precision Tensor Cores (FP16-TC) for the arithmetic can provide up to 4×speedup. This is due to the performance boost that the FP16-TC provide as well as to the improved accuracy over the classical FP16 arithmetic that is obtained because the GEMM accumulation occurs in FP32 arithmetic.
  •  
  • 4:30 PM - 4:55 PM

    CUDA: New Features and Updates


    Stephen Jones, Principal Software Engineer, NVIDIA
    WATCH NOW
    VIEW PDF
    Software / Developers
       
    •  
       
      Over the past year numerous updates to the CUDA platform have been released for libraries, language and system software. These target a range of diverse features from mixed precision solvers to scalable programming models to memory management to applications of ray tracing in numerical methods. This talk will present a tour of all that’s new and how to take advantage of it.
  •  
  • 5:00 PM - 5:25 PM

    DOE Exascale Computing Project Data Science Co-Design Center Advances


    James Ang, Chief Scientist for Computing in the Physical and Computational Sciences Directorate, DOE-SC/ASCR Sector Lead, Pacific Northwest National Laboratory
    WATCH NOW
    Accelerated Computing
       
    •  
       
      The U.S. Department of Energy’s Exascale Computing Project (ECP) established two co-design centers that are focused on data sciences, ExaGraph and ExaLearn. This talk will provide a brief overview of recent accomplishments by PNNL teams that are supported by these two co-design centers. The ExaGraph team designed and developed a scalable hybrid CPU-GPU influence maximization algorithm called CuRipples, and has collected performance measurements on three different state-of-the-art multi-GPU systems. This talk will also present recent ExaLearn team results from the application of convolutional neural networks to study clusters of water molecules as a graph structure.
  •  
  • 5:30 PM - 5:55 PM
    VIEW PDF
    Software / Developers
       
    •  
       
      We introduce cuTENSOR, a high-performance CUDA library for tensor operations that efficiently handles the ubiquitous presence of high-dimensional arrays (i.e., tensors) in today's HPC and DL workloads. This library supports highly efficient tensor operations such as tensor contractions (a generalization of matrix-matrix multiplications), element-wise tensor operations such as tensor permutations, and tensor reductions. While providing high performance, cuTENSOR also allows users to express their mathematical equations for tensors in a straight-forward way that hides the complexity of dealing with these high-dimensional objects behind an easy-to-use API.
  •  

THURSDAY 11/21

  • 10:00 AM - 10:25 AM

    Exascale Deep Learning to Accelerate Cancer Research


    Travis Johnston, Research Scientist, Oak Ridge National Laboratory
    WATCH NOW
    Software / Developers
       
    •  
       
      The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend for neural network architectures, especially those trained on ImageNet, has been to grow ever deeper and more complex. The result has been ever increasing accuracy on benchmark datasets with the cost of increased computational demands. In this talk we demonstrate that neural network architectures can be automatically generated, tailored for a specific application, with dual objectives: accuracy of prediction and speed of prediction. Using MENNDL--an HPC-enabled software stack for neural architecture search--we generate a neural network with comparable accuracy to state-of-the-art networks on a cancer pathology dataset that is also 16x faster at inference. The speedup in inference is necessary because of the volume and velocity of cancer pathology data; specifically, the previous state-of-the-art networks are too slow for individual researchers without access to HPC systems to keep pace with the rate of data generation. Our new model enables researchers with modest computational resources to analyze newly generated data faster than it is collected.
  •  
  • 10:30 AM - 10:55 AM

    Boosting GPU Performance in a Highly Complex HPC Environment


    Scott Suchyta, Senior Director, Enterprise Partners & Professional Services, Altair
    WATCH NOW
    VIEW PDF
    Accelerated Computing
       
    •  
       
      We will provide an overview of the operational status of the GPU and including it in the decisions of the job scheduler is useful for providing the users the optimal job placement. It is also beneficial for administrators to grasp the usage of GPU resources for planning resource allocation. Information Technology Center (ITC), The University of Tokyo, Hewlett-Packard Enterprise (HPE) and Altair Engineering, Inc. set up monitoring GPUs using NVIDIA Data Center GPU Manager (DCGM) and have deployed on Reedbush system.
  •  
  • 11:00 AM - 11:25 AM
    WATCH NOW
    VIEW PDF
    Software / Developers
       
    •  
       
      The parallel algorithms that were introduced in C++17 were designed to support GPU parallel programming. We have implemented these parallel algorithms in the PGI C++ compiler for NVIDIA GPUs, making it possible in some cases to run standard C++ on GPUs with no directives, pragmas, or annotations, and with performance similar to other GPU programming models. We will share our experiences and performance results, and explain the capabilities of the PGI implementation.
  •  
  • 11:30 AM - 11:55 AM

    GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud


    Dr. Melissa Smith, Associate Professor, Electrical and Computer Engineering, Clemson University
    Benjamin Shealy, PhD Candidate, Clemson University
    WATCH NOW
    VIEW PDF
    Data Analytics
       
    •  
       
      Big data pipelines are emerging as a common approach to scientific data analysis, in which large amounts of data are put through multiple stages of processing. In many cases, these pipelines can take advantage of hardware accelerators such as GPUs, and they can be run on a variety of systems ranging from local workstations to HPC clusters and cloud platforms. Here we present a big data pipeline for KINC, a network construction tool from the field of bioinformatics, we demonstrate how it can be run seamlessly on any of the aforementioned systems, and we demonstrate how it can take advantage of a GPU cluster to achieve massive speedup.
  •  
  • 12:00 PM - 12:25 PM

    Ethernet Accelerated Machine Learning & GPU Pods


    David Iles, Senior Director of Ethernet Switching, Mellanox Technologies
       
    •  
       
      Squeeze the most performance from your machine learning & GPU Pods with Ethernet switches that are built to accelerate these workloads. An AI-optimized switch delivers more than just low latency and high packet rates. An AI-optimized switch will have advanced congestion handling for RDMA, easy ROCE configuration, and great workload-specific telemetry functionality - like Mellanox’s What Just Happened visibility innovation.
  •  
  • 1:00 PM - 1:25 PM

    Video Bots to Serve 350 Million Customers


    Kaumudi Nivarthi, Principal Strategic Program Manager, Industry Vertical Group, Oracle Cloud Infrastructure
    WATCH NOW
    VIEW PDF
    AI / Machine Learning
       
    •  
       
      Reliance Jio has 350 million subscribers, Oracle Cloud and NVIDIA GPUs enable Jio to provide new and innovative ways to interact with their customers. Across multiple products, Reliance uses Bare Metal GPUs on Oracle cloud to answer over 300 million utterances and train with over 100 million parameters. With Speech to text, text to speech, and Natural Language Processing, Reliance has reduced their training time by 2.4X. Reliance’s Jio interact uses SparkML for an ensemble model to create the world’s first AI based video call platform. In this session Oracle will present the framework that they have used to enable their customers, demonstrate how Oracle Cloud Infrastructure has decreased time to market and reduced overall ML costs.
  •  
  • 1:30 PM - 1:55 PM

    Accelerating HPC at Supercomputing Scale on Microsoft Azure using GPUs


    Ian Finder, Senior Program Manager, Azure Specialized Compute, Microsoft
    WATCH NOW
    Accelerated Computing
       
    •  
       
      Microsoft and NVIDIA are continuously partnering to provide HPC customers with tools to push the boundaries of innovation while saving time and money. Come learn how recently announced state-of-the-art infrastructure on Azure powered by large cluster of NVIDIA V100 Tensor Core GPUs, connected by IB empower HPC practitioners to scale their HPC runs to hundreds of GPUs, solving problems previously unattainable.
  •  
  • 2:00 PM - 2:25 PM

    The CUDA C++ Standard Library


    Bryce Adelstein Lelbach, CUDA C++ Core Libraries Lead, NVIDIA
    WATCH NOW
    VIEW PDF
    Software / Developers
       
    •  
       
      CUDA C++ is an extension of the ISO C++ language which allows you to use familiar C++ tools to write parallel programs that run on GPUs. The depth and breadth of C++ language support has been a major element of CUDA’s roadmap, with a number of significant features enabled in the most recent release. In this example-oriented talk we’ll give an in-depth review of the newest and upcoming C++ capabilities, and explain how they can be used to build complex concurrent data structures and enable new classes of applications on modern NVIDIA GPUs.
  •  
  • 2:30 PM - 2:55 PM

    Volkswagen Group Research Works with Altair and Uses NVIDIA Technology on AWS to Accelerate Aerodynamics Concept Design


    Nicola Venuti, Sr. HPC Specialist Solution Architect, AWS
    Rick Watkins, Director, Business Development, Appliances, Altair
    WATCH NOW
    Accelerated Computing
       
    •  
       
      Simulation on NVIDIA GPUs on AWS is a game-changer for aerodynamic development in the automotive industry. The team at Altair believe this evolution in development will help to optimize fuel efficiency further and improve the range of electric vehicles while allowing for flexibility in the choices and changes made by stylists. The resulting computational cost savings that can be achieved are also significant: In this session Altair and AWS talk about how using ultraFluidX on GPUs instead of a CPU-based CFD solver, Volkswagen could save up to 70 percent of its existing hardware cost.
  •