Apache MXNet

Apache MXNet is a flexible and scalable deep learning framework that supports many deep learning models, programming languages, and features a development interface that’s highly regarded for its ease of use.


What is Apache MXNet?

MXNet is an open-source deep learning framework that allows you to define, train, and deploy deep neural networks on a wide array of devices, from cloud infrastructure to mobile devices. It’s highly scalable, allowing for fast model training, and supports a flexible programming model and multiple languages.

MXNet lets you mix symbolic and imperative programming flavors to maximize both efficiency and productivity. It’s built on a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient.

The MXNet library is portable and lightweight. It’s accelerated with the NVIDIA Pascal™ GPUs and scales across multiple GPUs and multiple nodes, allowing you to train models faster.

Why Apache MXNet?

Apache MXNet offers the following key features and benefits:

  • Hybrid frontend: The imperative symbolic hybrid Gluon API provides an easy way to prototype, train, and deploy models without sacrificing training speed. Developers need just a few lines of Gluon code to build linear regression, CNN, and recurrent LSTM models for such uses as object detection, speech recognition, and recommendation engines.
  • Scalability:  Designed from the ground up for cloud infrastructure, MXNet uses a distributed parameter server that can achieve an almost linear scale using multiple GPUs or CPUs. Deep learning workloads can be distributed across multiple GPUs with near-linear scalability and auto-scaling. Tests run by Amazon Web Services found that MXNet performed 109 times faster across a cluster of 128 GPUs than with a single GPU. It’s because of the ability to scale to multiple GPUs (across multiple hosts) along with development speed and portability that AWS adopted MXNet as its deep learning framework of choice over alternatives such as TensorFlow, Theano, and Torch.
  • Ecosystem: MXNet has toolkits and libraries for computer vision, natural language processing, time series, and more.
  • Languages: MXNet-supported languages include Python, C++, R, Scala, Julia, Matlab, and JavaScript. MXNet also compiles to C++, producing a lightweight neural network model representation that can run on everything from low-powered devices like Raspberry Pi to cloud servers.

Features of Apache MXNET.

How does MXNet work?

Created by a consortium of academic institutions and incubated at the Apache Software Foundation, MXNet (or “mix-net”) was designed to blend the advantages of different programming approaches to deep learning model development—imperative, which specifies exactly “how” computation is performed, and declarative or symbolic, which focuses on “what” should be performed.

MXNET blends the advantages of different programming approaches.

Image reference: https://www.cs.cmu.edu/~muli/file/mxnet-learning-sys.pdf

Imperative Programming Mode

MXNet’s NDArray, with imperative programming, is MXNet’s primary tool for storing and transforming data. NDArray is used to represent and manipulate the inputs and outputs of a model as multi-dimensional arrays. NDArray is similar to NumPy’s ndarrays, but they can run on GPUs to accelerate computing. 

Imperative programming has the advantage that it’s familiar to developers with procedural programming backgrounds, it’s more natural for parameter updates and interactive debugging.

Symbolic Programming Mode

Neural networks transform input data by applying layers of nested functions to input parameters. Each layer consists of a linear function followed by a nonlinear transformation.  The goal of deep learning is to optimize these parameters (consisting of weights and biases) by computing their partial derivatives (gradients) with respect to a loss metric. In forward propagation, the neural network takes the input parameters and outputs a confidence score to the nodes in the next layer until the output layer is reached where the error of the score is calculated. With backpropagation inside of a process called gradient descent, the errors are sent back through the network again and the weights are adjusted, improving the model.

Graphs are data structures consisting of connected nodes (called vertices) and edges. Every modern framework for deep learning is based on the concept of graphs, where neural networks are represented as a graph structure of computations. 

MXNet symbolic programming allows functions to be defined abstractly through computation graphs. With symbolic programming, complex functions are first expressed in terms of placeholder values. Then these functions can be executed by binding them to real values. Symbolic programming also provides predefined neural network layers allowing to express large models concisely with less repetitive work and better performance.

Symbolic programming.

Image reference: https://www.cs.cmu.edu/~muli/file/mxnet-learning-sys.pdf

Symbolic programming has the following advantages:

  • The clear boundaries of a computation graph provide more optimization opportunities by the backend MXNet executor
  • It’s easier to specify the computation graph for neural network configurations

Hybrid Programming Mode with the Gluon API

One of the key advantages of MXNet is its included hybrid programming interface, Gluon, which bridges the gap between the imperative and symbolic interfaces while keeping the capabilities and advantages of both. Gluon is an easy-to-learn language that produces fast portable models. With the Gluon API, you can prototype your model imperatively using NDArray. Then you can switch to symbolic mode with the hybridize command for faster model training and inference. In symbolic mode, the model runs faster as an optimized graph with the backend MXNet executor and can be easily exported for inference in different language bindings like java or C++.

Why MXNet Is Better on GPUs

Architecturally, the CPU is composed of just a few cores with lots of cache memory that can handle a few software threads at a time. In contrast, a GPU is composed of hundreds of cores that can handle thousands of threads simultaneously.

The difference between a CPU and GPU.

Because neural nets are created from large numbers of identical neurons, they’re highly parallel by nature. This parallelism maps naturally to GPUs, providing a significant computation speed-up over CPU-only training. GPUs have become the platform of choice for training large, complex neural network-based systems for this reason. The parallel nature of inference operations also lend themselves well for execution on GPUs.

With improved algorithms, bigger datasets, and GPU-accelerated computation, deep learning neural networks have become an indispensable tool for image recognition, speech recognition, language translation, and more in numerous industries. MXNet was developed with the goal to offer powerful tools to help developers exploit the full capabilities of GPUs and cloud computing. 

Simply stated, the more GPUs you put to work on an MXNet training algorithm, the faster the job completes. The framework is a standout in scalable performance with nearly linear speed improvements as more GPUs are brought to bear. MXNet was also designed to scale automatically according to available GPUs, a plus for performance tuning.

Use Cases

Smartphone Apps

MXNet is well-suited for image recognition, and its ability to support models that run on low-power, limited-memory platforms make it a good choice for mobile phone deployment. Models built with MXNet have been shown to provide highly reliable image recognition results running natively on laptop computers. Combining local and cloud processors could enable powerful distributed applications in areas like augmented reality, object, and scene identification.

Voice and image recognition applications also have intriguing possibilities for people with disabilities. For example, mobile apps could help vision-impaired people to better perceive their surroundings and people with hearing impairments to translate voice conversations into text.

Autonomous Vehicles

Self-driving cars and trucks must process an enormous amount of data to make decisions in near-real-time. The complex networks that are developing to support fleets of autonomous vehicles use distributed processing to an unprecedented degree to coordinate everything from the braking decisions in a single car to traffic management across an entire city.

TuSimple—which is building an autonomous freight network of mapped routes that allow for autonomous cargo shipments across the southwestern U.S.—chose MXNet as its foundational platform for artificial intelligence model development. The company is bringing self-driving technology to an industry plagued with a chronic driver shortage as well as high overhead due to accidents, shift schedules, and fuel inefficiencies.

TuSimple chose MXNet because of its cross-platform portability, training efficiency, and scalability. One factor was a benchmark that compared MXNet against TensorFlow and found that in an environment with eight GPUs, MXNet was faster, more memory-efficient, and more accurate.

Why MXNet Matters to…

Data scientists

Machine learning is a growing part of the data science landscape. For those who are unfamiliar with the fine points of deep learning model development, MXNet is a good place to start. Its broad language support, Gluon API, and flexibility are well-suited to organizations developing their own deep learning skill sets. Amazon’s endorsement ensures that MXNet will be around for the long term and that the third-party ecosystem will continue to grow. Many experts recommend MXNet as a good starting point for future excursions into more complex deep learning frameworks.

Machine learning researchers

MXNet is often used by researchers for its ability to prototype quickly, which makes it easier to transform their research ideas into models and assess results. It also supports imperative programming which gives much more control to the researchers for computation. This particular framework has also shown significant performance on certain types of the model when compared to other frameworks due to the great utilization of CPUs and GPUs.

Software developers

Flexibility is a valued commodity in software engineering and MXNet is about as flexible a deep learning framework as can be found. In addition to its broad language support, it works with various data formats, including Amazon S3 cloud storage, and can scale up or down to fit most platforms. In 2019, MXNet added support for Horovod, a distributed learning framework developed by Uber. This offers software engineers even more flexibility in specifying deployment environments, which may include everything from laptops to cloud servers.


MXNet recommends NVIDIA GPUs to train and deploy neural networks because it offers significantly more computation power compared to CPUs, providing huge performance boosts in raining and inference. Developers can easily get started with MXNet using NGC (NVIDIA GPU Cloud). Here, users can pull containers having pre-trained models available on a variety of tasks such as computer vision, natural language processing, etc. with all the dependencies and framework in one container. With NVIDA TensorRT™, inference performance can be improved significantly on MXNet when using NVIDIA GPUs.

NVIDIA Deep Learning for Developers

GPU-accelerated deep learning frameworks offer the flexibility to design and train custom deep neural networks and provide interfaces to commonly used programming languages such as Python and C/C++. Widely used deep learning frameworks such as MXNet, PyTorch, TensorFlow, and others rely on NVIDIA GPU-accelerated libraries to deliver high-performance, multi-GPU accelerated training.

NVIDIA GPU-accelerated libraries.

NVIDIA GPU Accelerated, End-to-End Data Science

The NVIDIA RAPIDS suite of open-source software libraries, built on CUDA-X AI, provides the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA CUDA primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

With the RAPIDS GPU DataFrame, data can be loaded onto GPUs using a Pandas-like interface, and then used for various connected machine learning and graph analytics algorithms without ever leaving the GPU. This level of interoperability is made possible through libraries like Apache Arrow. This allows acceleration for end-to-end pipelines—from data prep to machine learning to deep learning.

Data preparation, model training, and visualization.

RAPIDS supports device memory sharing between many popular data science libraries. This keeps data on the GPU and avoids costly copying back and forth to host memory.

Popular data science libraries.