Get CUDA Performance Strategies

Categories:


Monte-Carlo Option Pricing with Multi-GPU support For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample evaluates fair call price for a given set of European options using Monte-Carlo approach, taking advantage of all CUDA-capable GPUs installed in the system.
GeForce® 8 Series
Quadro® FX 5600 or later
Tesla™


Download - Windows
Download - Linux

Matrix Transpose For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Efficient matrix transpose.
GeForce® 8 Series
Quadro® FX 5600 or later
Tesla™

Download - Windows
Download - Linux

Clock For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This example shows how to use the clock function to measure the performance of kernel accurately.
GeForce® 8 Series
Quadro® FX 5600 or later
Tesla™

Download - Windows
Download - Linux

Aligned Types For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

A simple test, showing huge access speed gap between aligned and misaligned structures.
GeForce® 8 Series
Quadro® FX 5600 or later
Tesla™

Download - Windows
Download - Linux

Parallel Reduction For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

A parallel sum reduction that computes the sum of large arrays of values. This sample demonstrates several important optimization stratezies for parallel algorithms like reduction.
GeForce® 8 Series
Quadro® FX 5600 or later
Tesla™

Whitepaper
Download - Windows
Download - Linux

asyncAPI For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample uses CUDA streams and events to overlap execution on CPU and GPU.
GeForce® 8 Series
Quadro® FX 5600 or later
Tesla™

Download - Windows
Download - Linux

simpleStreams For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample uses CUDA streams to overlap kernel executions with memcopies between the device and the host.
GeForce® 8 Series
Quadro® FX 5600 or later
Tesla™

Download - Windows
Download - Linux

Bandwidth Test For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This is a simple test program to measure the memcopy bandwidth of the GPU. It currently is capable of measuring device to device copy bandwidth, host to device copy bandwidth for pageable and page-locked memory, and device to host copy bandwidth for pageable and page-locked memory.
GeForce® 8 Series
Quadro® FX 5600 or later
Tesla™

Download - Windows
Download - Linux

Scan For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This example demonstrates an efficient CUDA implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array.
GeForce® 8 Series
Quadro® FX 5600 or later
Tesla™

Whitepaper
Download - Windows
Download - Linux

Scan of Large Arrays For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This example demonstrates an efficient CUDA implementation of parallel prefix sum (also known as "scan") for arbitrary-sized arrays. Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array.
GeForce® 8 Series
Quadro® FX 5600 or later
Tesla™

Whitepaper
Download - Windows
Download - Linux

 

© 2011 NVIDIA Corporation | Privacy Policy | Legal Info
 
NVIDIA CUDA Zone Home