GPU-Accelerated Torch

Get started today with this GPU Ready Apps Guide.

Torch

Torch is a deep learning framework with wide support for machine learning algorithms. It's open-source, simple to use, and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C / CUDA implementation.

Torch offers popular neural network and optimization libraries that are easy to use, yet provide maximum flexibility to build complex neural network topologies.

It also runs up to 70% faster on the latest NVIDIA Pascal™ GPUs, so you can now train networks in hours, instead of days.

Installation

System Requirements

The GPU-accelerated version of Torch has the following requirements:

  • Ubuntu 14.x (or any 64-bit Linux if you choose to build from source)
  • NVIDIA® CUDA® 7.5 or newer (For Pascal GPUs, CUDA 8.0 or newer)
  • cuDNN v5.0 or newer

You will also need an NVIDIA GPU supporting compute capability 3.0 or higher. NVIDIA Tesla® P100 and M40 are designed for machine learning workloads. We recommend P100 and M40 for servers and TitanX for PCs.

Download and Installation Instructions

Torch is built around LuaRocks—a package manager for Lua—and has modular structure. A common collection of Torch modules is distributed under BSD open source license on GitHub. We recommend using a pre-built Torch debian package (Ubuntu 14.x only).

BELOW IS A BRIEF SUMMARY OF THE COMPILATION PROCEDURE

1. Add CUDA and machine learning repositories to apt-get

Get access to machine learning packages from NVIDIA by downloading and installing the cuda-repo-ubuntu1404 and nvidia-machine-learning-repo packages. Run the following commands to get access to the required repositories:

1 > CUDA_REPO_PKG=cuda-repo-ubuntu1404_7.5-18_amd64.deb && wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/$CUDA_REPO_PKG && sudo dpkg -i $CUDA_REPO_PKG 

2 > ML_REPO_PKG=nvidia-machine-learning-repo_4.0-2_amd64.deb && wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1404/x86_64/$ML_REPO_PKG && sudo dpkg -i $ML_REPO_PKG 

3 > sudo app-get update

This provides access to the NVIDIA repositories containing Ubuntu packages for CUDA and ML, like cuda-toolkit-8-0, digits, caffe-nv, torch, and libcudnn5.

2. Install Torch packages via apt-get

Now that you've configured access to NVIDIA ML repositories, install the Torch package and its dependencies:

4 > sudo app-get install libcudnn5 libcudnn5-dev torch7-nv

If you have a different system, or otherwise choose to build Torch from source, please see the Torch Cheat Sheet.

Training Models

Once Torch has been installed on your system, it can be run as follows:

5 > th 
 ______             __   |  Torch7                                          
/_  __/__  ________/ /   |  Scientific computing for Lua. 
 / / / _ \/ __/ __/ _ \  |  Type ? for help                                 
/_/  \___/_/  \__/_//_/  |  https://github.com/torch          
                         |  http://torch.ch                   
    
th>

We'll use ResNet as an example here. For instructions on ResNet module and training images data set downloading and installation, please visit Facebook ResNet Training page.

So, let's assume you have successfully installed both.

To run the training, go to the directory of ResNet clone and run:

main.lua

By default, the script runs ResNet-34 on ImageNet with a single GPU and two data-loader threads:

th main.lua -data [imagenet-folder with train and val folders]

To train ResNet-50 on four GPUs and eight CPU threads:

th main.lua -depth 50 -batchSize 256 -nGPU 4 -nThreads 8 -shareGradInput true -data [imagenet-folder]

Trained models and additional resources are available from the ResNet Training Page.

Benchmarks

This section demonstrates GPU acceleration for selected datasets. The benchmarks are listed in increasing number of atoms order. When reading the output the figure of merit is "ns/day" (the higher the better), located at the end of the output in the "mdout" file. It is best to take the measurement over all time steps (instead of the last 1000 steps).

IMAGE TRAINING PERFORMANCE
ON ALEXTNET AND GOOGLENET

AlexNet, a convolution neural network, was developed to classify 1.2M images in over a thousand different categories

GoogLeNet is a newer deep learning model that takes advantage of a deeper and wider network to provide higher accuracy of image classification.

NVIDIA Tesla P100 PCIe Performance
NVIDIA Tesla M40 Performance
NVIDIA Tesla K80 Performance

Recommended System Configurations

Hardware Configuration

PC

Parameter
Specs

CPU Architecture

x86_64

System Memory

8-32GB

CPUs

1

GPU Model

NVIDIA ®TITAN X

GPUs

1-2

Servers

Parameter
Specs

CPU Architecture

x86_64

System Memory

32 GB

CPUs/Nodes

1-2

GPU Model

Tesla ® P100
Tesla ® M40

GPUs/Node

1-4

Software Configuration

Software stack

Parameter
Version

OS

Ubuntu 14.04

GPU Driver

352.68 or newer

CUDA Toolkit

8.0 or newer

cuDNN Library

v5.0 or newer

Python

2.7

Build Your Ideal GPU Solution Today.