Kepler Architecture

  • What is GPU Computing
  • GPU Applications
  • Servers and Workstations
What is GPU Computing? GPGPU, CUDA and Kepler Explained


Get 3x the performance with the NVIDIA® Kepler, the world's fastest and most efficient high performance computing (HPC) architecture. With innovative computing technology and features, it is applicable to a broader range of scientific computing applications and makes hybrid computing more accessible for application developers and researchers.

Kepler's break-through performance is made possible by:

Kepler SMX processing

Delivers more processing performance and efficiency through this new, innovative streaming multiprocessor design that allows a greater percentage of space to be applied to processing cores versus control logic
Dynamic Parallelism
Dynamic Parallelism
Simplifies GPU programming by allowing programmers to easily accelerate all parallel nested loops – resulting in a GPU dynamically spawning new threads on its own without going back to the CPU


Slashes CPU idle time by allowing multiple CPU cores to simultaneously utilize a single Kepler GPU, dramatically advancing programmability and efficiency

Kepler SMX processing

Higher performance and efficiency achieved with SMX by increasing processing cores while reducing control logic.

Dynamic Parallelism

Dynamic Parallelism on Kepler GPU dynamically spawns new threads by adapting to the data without going back to the CPU, greatly simplifying GPU programming and accelerating a broader set of popular algorithms.


With Dynamic Parallelism, the grid resolution could be determined dynamically at runtime. The simulation can "zoom in" on areas of interest and avoid unnecessary calculation in areas with little change.


Kepler's Hyper-Q increases GPU utilization by providing streams access to 32 independent hardware work queues or MPI ranks leading to advanced programmability and efficiency.


Hyper-Q enables multiple CPU cores to launch work on a single GPU simultaneously, thereby dramatically increasing GPU utilization and slashing CPU idle times.