Get started today with this GPU-Ready Apps guide.
TensorFlow is a software library for designing and deploying numerical computations, with a key focus on applications in machine learning. The library allows algorithms to be described as a graph of connected operations that can be executed on various GPU-enabled platforms ranging from portable devices to desktops to high-end servers.
TensorFlow runs up to 50% faster on the latest Pascal GPUs and scales well across GPUs. Now you can train the models in hours instead of days.
The GPU-enabled version of TensorFlow has the following requirements:
You will also need an NVIDIA GPU supporting compute capability 3.0 or higher.
TensorFlow is distributed under an Apache v2 open source license on GitHub. This guide will walk through building and installing TensorFlow in a Ubuntu 16.04 machine with one or more NVIDIA GPUs.
The TensorFlow site is a great resource on how to install with virtualenv, Docker, and installing from sources on the latest released revs.
BELOW IS A BRIEF SUMMARY OF THE COMPILATION PROCEDURE
1. Update/install NVIDIA drivers.
Install up-to-date NVIDIA drivers for your system.
$ sudo add-apt-repository ppa:graphics-drivers/ppa $ sudo apt update (re-run if any warning/error messages) $ sudo apt-get install nvidia- (press tab to see latest). 375 (do not use 378, may cause login loops)
Reboot to let graphics driver take effect.
2. Install and test CUDA.
To use TensorFlow with NVIDIA GPUs, the first step is to install the CUDA Toolkit by following the official documentation. Steps for CUDA 8.0 for quick reference as follow:
Navigate to https://developer.nvidia.com/cuda-downloads.
Select Linux, x86_64, Ubuntu, 16.04, deb (local). $ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb (this is the deb file you've downloaded) $ sudo apt-get update $ sudo apt-get install cuda
If you encounter message suggesting to re-perform sudo apt-get update, please do so and then re-run sudo apt-get install CUDA.
$ export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}} $ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Test your CUDA installation:
$ cd /usr/local/cuda-8.0/samples/5_Simulations/nbody $ sudo make $ ./nbody
If successful, a new window will popup running n-body simulation.
3. Install cuDNN.
Once the CUDA Toolkit is installed, download cuDNN v5.1 Library (cuDNN v6 if on TF v1.3) for Linux and install by following the official documentation. (Note: You will need to register for the Accelerated Computing Developer Program). Steps for cuDNN v5.1 for quick reference as follow:
Once downloaded, navigate to the directory containing cuDNN:
$ tar -xzvf cudnn-8.0-linux-x64-v5.1.tgz $ sudo cp cuda/include/cudnn.h /usr/local/cuda/include $ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 $ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
Note: Steps above are similar for cuDNN v6.
Now that the prerequisites are installed, we can build and install TensorFlow.
4. Prepare TensorFlow dependencies and required packages.
$ sudo apt-get install libcupti-dev
5. Install TensorFlow (GPU-accelerated version).
$ pip install tensorflow-gpu
6. Verify a successful installation.
Let’s quickly verify a successful installation by first closing all open terminals and open a new terminal.
Change directory (cd) to any directory on your system other than the tensorflow subdirectory from which you invoked the configure command.
Invoke python: type python in command line
Input the following short program:
$ import tensorflow as tf $ hello = tf.constant('Hello, TensorFlow!') $ sess = tf.Session() $ print(sess.run(hello))
You should see “Hello, TensorFlow!”. Congratulations! You may also input “print(tf.__version__)” to see the installed TensorFlow’s version.
TensorFlow can be used via Python or C++ APIs, while its core functionality is provided by a C++ backend. The API provides an interface for manipulating tensors (N-dimensional arrays) similar to Numpy, and includes automatic differentiation capabilities for computing gradients for use in optimization routines.
The library comes with a large number of built-in operations, including matrix multiplications, convolutions, pooling and activation functions, loss functions, optimizers, and many more. Once a graph of computations has been defined, TensorFlow enables it to be executed efficiently and portably on desktop, server, and mobile platforms.
To run the example codes below, first change to your TensorFlow directory 1:
$ cd (tensorflow directory) $ git clone -b update-models-1.0 https://github.com/tensorflow/models
Image recognition is one of the tasks that Deep Learning excels in. While human brains make this task of recognizing images seem easy, it is a challenging task for the computer. However, there have been significant advancements over the past few years to the extent of surpassing human abilities. What makes this possible is the convolutional neural network (CNN) and ongoing research has demonstrated steady advancements in computer vision, validated against ImageNet–an academic benchmark for computer vision.
First, let’s run the following commands and see what computer vision can do:
$ cd (tensorflow directory)/models/tutorials/image/imagenet $ python classify_image.py
classify_image.py downloads the trained Inception-v3 model from tensorflow.org when the program is run for the first time. You'll need about 200M of free space available on your hard disk. The above command will classify a supplied image of a panda bear (found in /tmp/imagenet/cropped_panda.jpg) and a successful execution of the model will return results that look like:
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89107) indri, indris, Indri indri, Indri brevicaudatus (score = 0.00779) lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00296) custard apple (score = 0.00147) earthstar (score = 0.00117)
You may also test other JPEG images by using the --image_file file argument:
$ python classify_image.py --image_file <path to the JPEG file> (e.g. python classify_image.py --image_file /tmp/imagenet/cropped_pand.jpg)
CIFAR-10 classification is a common benchmark task in machine learning. The task is to classify RGB 32x32 pixel images across 10 categories (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck).
The model used references the architecture described by Alex Krizhevsky, with a few differences in the top few layers. It is a multi-layer architecture consisting of alternating convolutions and nonlinearities, followed by fully connected layers leading into a softmax classifier.
First, let’s train the model:
$ cd (tensorflow directory)/models/tutorials/image/cifar10 $ python cifar10_train.py
If successful, you will see something similar to what's listed below:
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes. …… 2017-03-06 14:59:09.089282: step 10230, loss = 2.12 (1809.1 examples/sec; 0.071 sec/batch) 2017-03-06 14:59:09.760439: step 10240, loss = 2.12 (1902.4 examples/sec; 0.067 sec/batch) 2017-03-06 14:59:10.417867: step 10250, loss = 2.02 (1931.8 examples/sec; 0.066 sec/batch) 2017-03-06 14:59:11.097919: step 10260, loss = 2.04 (1900.3 examples/sec; 0.067 sec/batch) 2017-03-06 14:59:11.754801: step 10270, loss = 2.05 (1919.6 examples/sec; 0.067 sec/batch) 2017-03-06 14:59:12.416152: step 10280, loss = 2.08 (1942.0 examples/sec; 0.066 sec/batch) ……
Congratulations, you have just started training your first model.
Following the training, you can evaluate how well the trained model performs by using the cifar10_eval.py script. It calculates the precision at 1: how often the top prediction matches the true label of the image.
$ python cifar10_eval.py
2017-03-06 15:34:27.604924: precision @ 1 = 0.499
Next, let’s revisit Google’s Inception v3 and get more involved with a deeper use case. Inception v3 is a cutting-edge convolutional network designed for image classification. Training this model from scratch is very intensive and can take from several days up to weeks of training time. An alternative approach is to download the pre-trained model, and re-train it on another dataset. We will walkthrough how this is done using the flowers dataset.
Download the flowers dataset:
$ cd ~ $ curl -O http://download.tensorflow.org/example_images/flower_photos.tgz $ tar xzf flower_photos.tgz $ cd (tensorflow directory where you git clone from master) $ python configure.py
Note: You can leave most options default. Input the right version number of cuDNN and/or CUDA if you have different versions installed from the suggested default by configurator.
$ python tensorflow/examples/image_retraining/retrain.py --image_dir ~/flower_photos
Note:
Using the Retrained Model:
$ bazel build tensorflow/examples/image_retraining:label_image && \ bazel-bin/tensorflow/examples/image_retraining/label_image \ --graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt \ --output_layer=final_result:0 \ --image=$HOME/flower_photos/daisy/21652746_cc379e0eea_m.jpg
The evaluation script will return results that look as follow, providing you with the classification accuracy:
daisy (score = 0.99735) sunflowers (score = 0.00193) dandelion (score = 0.00059) tulips (score = 0.00009) roses (score = 0.00004)
For more details on using the retrained Inception v3 model, see the tutorial link.
Each of the models described in the previous section output either an execution time/minibatch or an average speed in examples/second, which can be converted to the time/minibatch by dividing into the batch size. The graphs show expected performance on systems with NVIDIA GPUs.
The Inception v3 model also supports training on multiple GPUs. The graph below shows the expected performance on 1, 2, and 4 Tesla GPUs per node.