If you have any questions or would like to reach the GTC Content Team, please contact us.
Participate at GTC with a talk on a GPU-related topic and share your vision with many of the world's brightest minds. Presenters receive a GTC conference pass.
Submissions must be about your work using GPUs. It can be completed work or work currently in progress. Submissions must provide actual or expected results/accomplishments and must demonstrate a significant innovation or improvement using GPU computing.
TALK REVIEW PROCESS
The GTC Content Committee will review, rate, and select submissions based on:
VENDOR SUBMISSIONS
If your submission is focused on a service, technology, or a new product your company is offering, please contact us for information on sponsored session opportunities.
You will be required to provide the following information in your submission:
How to Write a Great Session Title
Your session title is what gets the reader to read the first sentence of the session description. Clearly articulated session titles that have clear learning objectives along with a dash of pizzazz greatly increase the chance that conference attendees will attend the session.
The Title Should:
How to Write a Great Session Description
The first sentence should describe what the attendee can expect to learn from your presentation (e.g. "Learn about extensions that enable efficient use of PGAS models.") Avoid background your audience already knows (e.g., "Originally designed as graphics accelerators, GPUs have evolved into powerful parallel processors capable of accelerating many compute-intensive applications."). Subsequent sentences should offer more details about what will be covered and why the reader should attend. In general, go for clarity over cleverness.
The Description Should Begin with an Action Word Such As:
Select the Correct Duration for Your Submission:
GTC is soliciting submissions that provide concrete examples and contain both practical and theoretical information.
Two speakers may be accepted for a 50-minute talk if you can demonstrate the second person is necessary by describing their role in the presentation. 50-minute panels should have no more than five speakers and an 80-minute tutorial no more than three speakers.
Session Title: Faster, Cheaper, Better – Hybridization of Linear Algebra for GPUs
Session Description: Learn how to develop faster, cheaper and better linear algebra software for GPUs through a hybridization methodology that is built on (1) Representing linear algebra algorithms as directed acyclic graphs where nodes correspond to tasks and edges to dependencies among them, and (2) Scheduling the execution of the tasks over hybrid architectures of GPUs and multicore. Examples will be given using MAGMA, a new generation of linear algebra libraries that extends the sequential LAPACK-style algorithms to the highly parallel GPU and multicore heterogeneous architectures.
Session Title: Analysis-Driven Performance Optimization: A New Approach to Determining Performance Thresholds
Session Description: The goal of this session is to demystify performance optimization by transforming it into an analysis-driven process. There are three fundamental limiters to kernel performance: instruction throughput, memory throughput, and latency. In this session we will describe: how to use profiling tools and source code instrumentation to assess the significance of each limiter; what optimizations to apply for each limiter; how to determine when hardware limits are reached. Concepts will be illustrated with some examples and are equally applicable to both CUDA and OpenCL development. It is assumed that registrants are already familiar with the fundamental optimization techniques.
Our runtime designs focus on performance while ensuring truly one-sided progress of communication which is critical for PGAS models. These designs demonstrate significant potential for the users of PGAS models as well as hybrid MPI+PGAS models (as available in MVAPICH2-X) to take advantage of NVIDIA GPUs. The extensions in OpenSHMEM, coupled with an optimized runtime, improve the latency of GPU-GPU shmem-getmem operation by up to 90%, 45% and 42%, for intra-IOH (I/O Hub), inter-IOH and inter-node configurations. The proposed extensions and the associated runtime reduces the latency of 4 bytes Put to 2.7us from GPU-to-GPU. The proposed enhancements improve the performance of Stencil2D kernel by 65% on a cluster of 192 GPUs and the performance of BFS kernel by 12% on a cluster of 96 GPUs. As part of the talk, we will use benchmarks from the popular OSU micro-benchmark suite and application kernels to demonstrate how to use the new extensions and extract performance benefits from the associated runtime designs.