GPU-Accelerated TensorFlow

Get started today with this GPU-Ready Apps guide.


TensorFlow is a software library for designing and deploying numerical computations, with a key focus on applications in machine learning. The library allows algorithms to be described as a graph of connected operations that can be executed on various GPU-enabled platforms ranging from portable devices to desktops to high-end servers.

TensorFlow runs up to 50% faster on the latest Pascal GPUs and scales well across GPUs. Now you can train the models in hours instead of days.


System Requirements

The GPU-enabled version of TensorFlow has the following requirements:

  • 64-bit Linux
  • Python 2.7
  • CUDA 7.5 (CUDA 8.0 required for Pascal GPUs)
  • cuDNN v5.1 (cuDNN v6 if on TF v1.3)


You will also need an NVIDIA GPU supporting compute capability 3.0 or higher.

Download and Installation Instructions

TensorFlow is distributed under an Apache v2 open source license on GitHub. This guide will walk through building and installing TensorFlow in a Ubuntu 16.04 machine with one or more NVIDIA GPUs.

The TensorFlow site is a great resource on how to install with virtualenv, Docker, and installing from sources on the latest released revs.


1. Update/install NVIDIA drivers.

Install up-to-date NVIDIA drivers for your system.

$ sudo add-apt-repository ppa:graphics-drivers/ppa 
$ sudo apt update (re-run if any warning/error messages) 
$ sudo apt-get install nvidia- (press tab to see latest). 375 (do not use 378, may cause login loops)

Reboot to let graphics driver take effect.

2. Install and test CUDA.

To use TensorFlow with NVIDIA GPUs, the first step is to install the CUDA Toolkit by following the official documentation. Steps for CUDA 8.0 for quick reference as follow:

Navigate to

Select Linux, x86_64, Ubuntu, 16.04, deb (local).
$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb (this is the deb file you've downloaded) 
$ sudo apt-get update
$ sudo apt-get install cuda

If you encounter message suggesting to re-perform sudo apt-get update, please do so and then re-run sudo apt-get install CUDA.

$ export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}} 
$ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Test your CUDA installation:

$ cd /usr/local/cuda-8.0/samples/5_Simulations/nbody
$ sudo make
$ ./nbody

If successful, a new window will popup running n-body simulation.

3. Install cuDNN.

Once the CUDA Toolkit is installed, download cuDNN v5.1 Library (cuDNN v6 if on TF v1.3) for Linux and install by following the official documentation. (Note: You will need to register for the Accelerated Computing Developer Program). Steps for cuDNN v5.1 for quick reference as follow:

Once downloaded, navigate to the directory containing cuDNN:

$ tar -xzvf cudnn-8.0-linux-x64-v5.1.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Note: Steps above are similar for cuDNN v6.

Now that the prerequisites are installed, we can build and install TensorFlow.

4. Prepare TensorFlow dependencies and required packages.

$ sudo apt-get install libcupti-dev

5. Install TensorFlow (GPU-accelerated version).

$ pip install tensorflow-gpu

6. Verify a successful installation.

Let’s quickly verify a successful installation by first closing all open terminals and open a new terminal.

Change directory (cd) to any directory on your system other than the tensorflow subdirectory from which you invoked the configure command.

Invoke python: type python in command line

Input the following short program:

$ import tensorflow as tf
$ hello = tf.constant('Hello, TensorFlow!')
$ sess = tf.Session()
$ print(

You should see “Hello, TensorFlow!”. Congratulations! You may also input “print(tf.__version__)” to see the installed TensorFlow’s version.

Training Models

TensorFlow can be used via Python or C++ APIs, while its core functionality is provided by a C++ backend. The API provides an interface for manipulating tensors (N-dimensional arrays) similar to Numpy, and includes automatic differentiation capabilities for computing gradients for use in optimization routines.

The library comes with a large number of built-in operations, including matrix multiplications, convolutions, pooling and activation functions, loss functions, optimizers, and many more. Once a graph of computations has been defined, TensorFlow enables it to be executed efficiently and portably on desktop, server, and mobile platforms.

To run the example codes below, first change to your TensorFlow directory 1:

$ cd (tensorflow directory) 
$ git clone -b update-models-1.0

Image recognition is one of the tasks that Deep Learning excels in. While human brains make this task of recognizing images seem easy, it is a challenging task for the computer. However, there have been significant advancements over the past few years to the extent of surpassing human abilities. What makes this possible is the convolutional neural network (CNN) and ongoing research has demonstrated steady advancements in computer vision, validated against ImageNet–an academic benchmark for computer vision.


First, let’s run the following commands and see what computer vision can do:

$ cd (tensorflow directory)/models/tutorials/image/imagenet
$ python downloads the trained Inception-v3 model from when the program is run for the first time. You'll need about 200M of free space available on your hard disk. The above command will classify a supplied image of a panda bear (found in /tmp/imagenet/cropped_panda.jpg) and a successful execution of the model will return results that look like:

giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89107)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00779)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00296)
custard apple (score = 0.00147)
earthstar (score = 0.00117)

You may also test other JPEG images by using the --image_file file argument:

$ python --image_file <path to the JPEG file> 
(e.g. python --image_file /tmp/imagenet/cropped_pand.jpg)


CIFAR-10 classification is a common benchmark task in machine learning. The task is to classify RGB 32x32 pixel images across 10 categories (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck).

The model used references the architecture described by Alex Krizhevsky, with a few differences in the top few layers. It is a multi-layer architecture consisting of alternating convolutions and nonlinearities, followed by fully connected layers leading into a softmax classifier.

First, let’s train the model:

$ cd (tensorflow directory)/models/tutorials/image/cifar10
$ python

If successful, you will see something similar to what's listed below:

Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
2017-03-06 14:59:09.089282: step 10230, loss = 2.12 (1809.1 examples/sec; 0.071 sec/batch)
2017-03-06 14:59:09.760439: step 10240, loss = 2.12 (1902.4 examples/sec; 0.067 sec/batch)
2017-03-06 14:59:10.417867: step 10250, loss = 2.02 (1931.8 examples/sec; 0.066 sec/batch)
2017-03-06 14:59:11.097919: step 10260, loss = 2.04 (1900.3 examples/sec; 0.067 sec/batch)
2017-03-06 14:59:11.754801: step 10270, loss = 2.05 (1919.6 examples/sec; 0.067 sec/batch)
2017-03-06 14:59:12.416152: step 10280, loss = 2.08 (1942.0 examples/sec; 0.066 sec/batch)

Congratulations, you have just started training your first model.

Following the training, you can evaluate how well the trained model performs by using the script. It calculates the precision at 1: how often the top prediction matches the true label of the image.

$ python

If successful, you will see something similar to what's listed below:

2017-03-06 15:34:27.604924: precision @ 1 = 0.499


Next, let’s revisit Google’s Inception v3 and get more involved with a deeper use case. Inception v3 is a cutting-edge convolutional network designed for image classification. Training this model from scratch is very intensive and can take from several days up to weeks of training time. An alternative approach is to download the pre-trained model, and re-train it on another dataset. We will walkthrough how this is done using the flowers dataset.

Download the flowers dataset:

$ cd ~
$ curl -O
$ tar xzf flower_photos.tgz
$ cd (tensorflow directory where you git clone from master)
$ python

Note: You can leave most options default. Input the right version number of cuDNN and/or CUDA if you have different versions installed from the suggested default by configurator.

$ python tensorflow/examples/image_retraining/ --image_dir ~/flower_photos


  • If you encounter “ version `CXXABI_1.3.8' not found…”, try cp /usr/lib/x86_64-linux-gnu/ /home/<your home directory name>/anaconda3/lib/
  • If encounter “...import error: no module named autograd…”, try pip install autograd.

Using the Retrained Model:

$ bazel build tensorflow/examples/image_retraining:label_image && \
bazel-bin/tensorflow/examples/image_retraining/label_image \
--graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt \
--output_layer=final_result:0 \

The evaluation script will return results that look as follow, providing you with the classification accuracy:

daisy (score = 0.99735)
sunflowers (score = 0.00193)
dandelion (score = 0.00059)
tulips (score = 0.00009)
roses (score = 0.00004)

For more details on using the retrained Inception v3 model, see the tutorial link.


Each of the models described in the previous section output either an execution time/minibatch or an average speed in examples/second, which can be converted to the time/minibatch by dividing into the batch size. The graphs show expected performance on systems with NVIDIA GPUs.



The Inception v3 model also supports training on multiple GPUs. The graph below shows the expected performance on 1, 2, and 4 Tesla GPUs per node.

TensorFlow Inception Benchmark

Recommended System Configurations

Hardware Configuration



CPU Architecture


System Memory




GPU Model






CPU Architecture


System Memory

32 GB



GPU Model

Tesla ® P40 and P100



Software Configuration

Software stack



Unbuntu 14.04

GPU Driver

352.68 or newer

CUDA Toolkit

8.0 or newer

cuDNN Library

v5.0 or newer



Build Your Ideal GPU Solution Today.