The fastest, easiest way to get started with deep learning on GPUs
Speed Up Training with GPU-Accelerated TensorFlow

Running Jobs

TensorFlow can be used via Python or C++ APIs, while its core functionality is provided by a C++ backend. The API provides an interface for manipulating tensors (N-dimensional arrays) similar to Numpy, and includes automatic differentiation capabilities for computing gradients for use in optimization routines.

The library comes with a large number of built-in operations, including matrix multiplications, convolutions, pooling and activation functions, loss functions, optimizers, and many more. Once a graph of computations has been defined, TensorFlow enables it to be executed efficiently and portably on desktop, server, and mobile platforms.

The example codes described in this section are designed to run on only a single GPU, with the exception of the Inception v3 model, which supports training on multiple GPUs within a system.


The TensorFlow python package comes with a number of example models that demonstrate the library and some of its key features.

To get started, a model that trains a simple convolutional neural network (CNN) to classify images of hand-written digits from the MNIST dataset can be executed using the following command (note that on the first run this will automatically download the 12 MB MNIST dataset into ./data if it is not already there):

$ python -m tensorflow.models.image.mnist.convolutional

Step 8500 (epoch 9.89), 11.4 ms
Minibatch loss: 1.595, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Test error: 0.8%

This will train the model for 10 epochs (iterations over the whole dataset) and achieve a final test error of 0.8%. To see the Python code for this model, look in the Python site packages directory, which will vary slightly on different systems but should be similar to:


If you cannot find your Python site packages directory, this command will print it out:

$ python -c 'import site; print "\n".join(site.getsitepackages())'

The minibatch size used during training can be modified by changing the BATCH_SIZE variable near the top of convolutional.py.

LSTM model

Recurrent networks can also be implemented in TensorFlow. An implementation of a long-short-term-memory (LSTM) network trained to predict words from the PennTree Bank (PTB) dataset is available in the TensorFlow GitHub repository:

$ wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
$ tar -xvf simple-examples.tgz
$ python tensorflow/tensorflow/models/rnn/ptb/ptb_word_lm.py --data=./simple-examples/data --model=small

Epoch: 13 Learning rate: 0.004
0.004 perplexity: 54.774 speed: 5755 wps
0.104 perplexity: 40.810 speed: 6282 wps
0.204 perplexity: 44.671 speed: 6294 wps
0.304 perplexity: 42.877 speed: 6306 wps
0.404 perplexity: 42.190 speed: 6307 wps
0.504 perplexity: 41.485 speed: 6308 wps
0.604 perplexity: 40.119 speed: 6310 wps
0.703 perplexity: 39.475 speed: 6313 wps
0.803 perplexity: 38.828 speed: 6312 wps
0.903 perplexity: 37.527 speed: 6311 wps
Epoch: 13 Train Perplexity: 36.733
Epoch: 13 Valid Perplexity: 122.366
Test Perplexity: 117.939

Inception v3

Google’s Inception v3 network is a cutting-edge convolutional network designed for image classification. Training this model from scratch is very intensive and can take from several days up to weeks of training time. An alternative approach is to download the model pre-trained, and then re-train it on another dataset.

First, download the pre-trained Inception v3 model, which includes the checkpoint file model.ckpt-157585:

$ export INCEPTION_DIR=/data # Change this to your prefered data location
$ curl -O http://download.tensorflow.org/models/image/imagenet/inception-v3-2016-03-01.tar.gz
$ tar -xvf inception-v3-2016-03-01.tar.gz

Next, clone the TensorFlow models repository:

$ git clone https://github.com/tensorflow/models.git tensorflow-models
$ cd tensorflow-models/inception

A dataset containing labeled images of flowers will be used to re-train the network. Follow these steps to download and preprocess the 218 MB flowers dataset:

$ export FLOWERS_DIR=/data/flowers # Change this to your prefered data location
$ mkdir -p $FLOWERS_DIR/data
$ bazel build inception/download_and_preprocess_flowers
$ bazel-bin/inception/download_and_preprocess_flowers $FLOWERS_DIR/data
# Ignore error ".../build_image_data: No such file or directory"
$ python inception/data/build_image_data.py --train_directory=$FLOWERS_DIR/data/raw-data/train/ --validation_directory=$FLOWERS_DIR/data/raw-data/validation/ --output_directory=$FLOWERS_DIR/data --labels_file=$FLOWERS_DIR/data/raw-data/labels.txt
Finished writing all 500 images in data set.
Finished writing all 3170 images in data set.
$ cd -

This will download the 218 MB flowers image dataset and then preprocess it into training and validation sets. The re-training procedure can then be executed using these steps (note that the additional commands are to avoid a dependency on the Bazel build system):

$ mkdir -p $FLOWERS_DIR/train
$ bazel build inception/flowers_train
$ bazel-bin/inception/flowers_train --train_dir=$FLOWERS_DIR/train --data_dir=$FLOWERS_DIR/data --pretrained_model_checkpoint_path=$INCEPTION_DIR/inception-v3/model.ckpt-157585 --fine_tune=True --initial_learning_rate=0.001 -input_queue_memory_factor=1 --max_steps=500 --num_gpus 1 --batch_size=64
step 450, loss = 1.21 (21.7 examples/sec; 2.951 sec/batch)
step 460, loss = 1.19 (21.6 examples/sec; 2.964 sec/batch)
step 470, loss = 1.07 (21.8 examples/sec; 2.931 sec/batch)
step 480, loss = 1.11 (21.7 examples/sec; 2.950 sec/batch)
step 490, loss = 1.24 (21.7 examples/sec; 2.956 sec/batch)

(Training can also be run on multiple GPUs by adding the --num_gpus=N option). The re-trained model can now be evaluated on the validation dataset:

$ mkdir -p $FLOWERS_DIR/eval
$ bazel build inception/flowers_eval
$ bazel-bin/inception/flowers_eval --eval_dir=$FLOWERS_DIR/eval --data_dir=$FLOWERS_DIR/data --subset=validation --num_examples=500 --checkpoint_dir=$FLOWERS_DIR/train --input_queue_memory_factor=1 --run_once
Successfully loaded model from /data/flowers/train/model.ckpt-499 at step=499.
starting evaluation on (validation).
precision @ 1 = 0.8574 recall @ 5 = 0.9980 [512 examples]

Here the top-1 (i.e., single guess) classification accuracy is 85% after retraining the model for 500 steps. The accuracy can be improved further by training for more steps. For more details on using the Inception v3 model, see the README document.