CUDA SDK Quick Links
Computational Finance
CUDA Advanced Topics
CUDA Basic Topics
CUDA Systems Integration
Data-Parallel Algorithms
Graphics Interop
Image/Video Processing and Data Compression
Linear Algebra
Performance Strategies
Physically-Based Simulation
Texture
 

NVIDIA CUDA SDK - Data-Parallel Algorithms



Separable Convolution For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample implements a separable convolution filter of a 2D signal with a gaussian kernel.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU



Download - Windows x86
Download - Windows x64
Download - Linux/Mac


Texture-based Separable Convolution For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Texture-based implementation of a separable 2D convolution with a gaussian kernel. Used for performance comparison against convolutionSeparable.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows x86
Download - Windows x64
Download - Linux/Mac


Bitonic Sort For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Bitonic sort is a very simple parallel sorting algorithm that is very efficient when sorting a small number of elements: http://citeseer.ist.psu.edu/blelloch98experimental.html This implementation is based on: http://www.tools-of-computing.com/tc/CS/Sorts/bitonic_sort.htm
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows x86
Download - Windows x64
Download - Linux/Mac


Line of Sight For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample is an implementation of a simple line-of-sight algorithm: Given a height map and a ray originating at some observation point, it computes all the points along the ray that are visible from the observation point. The implementation is based on the parallel scan primitive provided by the CUDPP library (http://www.gpgpu.org/developer/cudpp/).
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows x86
Download - Windows x64
Download - Linux/Mac


N-Body Simulation For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample demonstrates efficient all-pairs simulation of a gravitational n-body simulation in CUDA. This sample accompanies the GPU Gems 3 chapter "Fast N-Body Simulation with CUDA".
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU



Download - Windows x86
Download - Windows x64
Download - Linux/Mac


Parallel Reduction For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

A parallel sum reduction that computes the sum of large arrays of values. This sample demonstrates several important optimization stratezies for parallel algorithms like reduction.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU



Download - Windows x86
Download - Windows x64
Download - Linux/Mac


Mandelbrot For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample uses CUDA to compute and display the Mandelbrot set interactively. It also illustrates the use of "double single" arithmetic to improve precision when zooming a long way into the pattern. This sample use double precision hardware if a GTX 200 class GPU is present. Thanks to Mark Granger of NewTek who submitted this sample to the SDK!
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows x86
Download - Windows x64
Download - Linux/Mac


Fast Walsh Transform For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Naturally(Hadamard)-ordered Fast Walsh Tranform for batched vectors of arbitrary eligible(power of two) lengths
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows x86
Download - Windows x64
Download - Linux/Mac


Scan For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This example demonstrates an efficient CUDA implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU



Download - Windows x86
Download - Windows x64
Download - Linux/Mac


Scan of Large Arrays For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This example demonstrates an efficient CUDA implementation of parallel prefix sum (also known as "scan") for arbitrary-sized arrays. Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU



Download - Windows x86
Download - Windows x64
Download - Linux/Mac

Last Update: 06/15/2009