GPU-Accelerated Ansys Fluent

Get started today with this GPU Ready Apps Guide.

Ansys Fluent

ANSYS Fluent is a software tool designed to run computational fluid dynamics (CFD) simulations. It includes the broad physical modeling capabilities needed to model flow, turbulence, heat transfer, and reactions for industrial applications. It is used in a wide variety of industry segments including aerospace, automotive, medical devices, machinery, and semiconductor manufacturing.

ANSYS Fluent software supports solver computation on NVIDIA GPUs. This helps engineers reduce the time required to explore many design variables and optimize product performance to meet design deadlines. Algebraic multi-grid solver and radiation heat transfer models, including discrete ordinates (DO) radiation and S2S viewfactor calculations, are now GPU accelerated. Typical industry applications include external aerodynamics, internal fluid flow, and cooling simulations.

Ansys Fluent Runs up to 3.7x Faster on GPUs

Ansys Fluent runs up to 3.7X faster on GPUs, dramatically reducing time to solution from weeks to days.

In addition to speeding up simulation, GPUs help lower total cost of ownership over a CPU-only solution by delivering higher throughput and performance per watt.


How to install Ansys Fluent


Ansys Fluent, can be downloaded from the Ansys Customer Portal website and a brief installation instruction is detailed here. Version 18.1 is the latest revision at the time of this writing. Visit Ansys site to check for a newer release.

Step 1. Download the FLUIDS_181_LINX64.tar file and extract contents of tar file in a single directory. This will produce a single subdirectory named “FLUIDS_181_LINX64” containing the installation software.

Step 2. Run INSTALL in silent mode as described in the Installation Guide.

INSTALL -silent -install_dir <path> -fluent
-silent initiates silent installation and -install_dir <path> specifies the directory where Fluent has to be installed.

Ansys Licensing

GPUs are supported with all ANSYS HPC license products including ANSYS HPC, ANSYS HPC Pack and ANSYS HPC Workgroup. Each GPU is treated as a CPU core in terms of licensing, so users can gain higher productivity through GPU simulations.

Running Jobs

Run Simulations

To run the parallel version of Ansys Fluent simulations on GPUs, you can use the following syntax in a shell on a Linux system:

fluent -g <version> -t<nprocs> -gpgpu=<ngpgpus> -i <journal_file_name> >& <output_file_name>

Flags and functions

  • fluent command runs ANSYS Fluent interactively
  • -g indicates that the program is to be run without the GUI or graphics
  • <version> specifies the 3d or 3ddp version of ANSYS Fluent
  • <nprocs> specifies the total number of CPU processors across all machines/nodes
  • <ngpgpus> specifies the number of GPUs per machine/node available in parallel mode. Note that the number of processes per machine must be equal on all machines and ngpgpus must be chosen such that the number of processes per machine is an integer multiple of ngpgpus. That is, for nprocs solver processes running on M machines using ngpgpus GPUs per machine, we must have:
    • (nprocs) mod (M) = 0
    • (nprocs/M) mod (ngpgpus) = 0
  • <journal_file_name> specifies the name of the journal or input file.
  • <output_file_name> specifies the name of the output file. It is a file that the background job will create, which will contain the output that ANSYS Fluent would normally print to the screen (for example, the menu prompts and residual reports).

Journal File contains sequence of ANSYS Fluent commands that are identical to those that you would type interactively. Comments can be added in the file with a semicolon at the beginning of the line.

An example journal file is shown below:
; Read case file
/file/read-case sample.cas.gz
;Change Solution Scheme from Segregated (SIMPLEC)(21) to Coupled(24)
/solve/set/p-v-coupling 24
; Switch the AmgX GPU Aggregator size from default 2 to 4
; Initialize Solution
; Run Iterations
/solve/iterate 100
; Performance Timer Statistics for Iterations
; Exit Fluent
exit yes

Model Suitability for GPU Acceleration


NVIDIA partnered with ANSYS to develop a high-performance, robust and scalable GPU-accelerated AMG library. We call the library AmgX (for AMG Accelerated). Fluent uses AmgX as its default linear solver, and it takes advantage of a CUDA-enabled GPU when it detects one.  AmgX can even use MPI to connect clusters of servers to solve very large problems that require dozens of GPUs. When enabled, you can use GPU acceleration for AMG computations on linear systems with up to 5 coupled equations and computing requirements grow as the number of cells in the domain increase. Problems that contain less than a few million cells do not gain speed from GPUs because of communication overheads incurred in transferring matrices from or to CPUs. However, speedup is significant for meshes that contain tens and hundreds of millions of cells because the overhead is relatively small compared to the computing time in the AMG solver.

A coupled solver benefits most from GPUs. In flow only problems, typically the coupled solver spends about 60 percent to 70 percent of its time solving the linear system using AMG, making GPUs a good choice. Since the segregated solver spends only 30 percent to 40 percent of its time in AMG, GPUs may not be advantageous because of memory transfer overhead costs. By default, GPU acceleration is applied automatically to coupled systems and not to scalar systems because scalar systems typically are not as computationally expensive. However, if desired you can enable/disable GPGPU acceleration of the AMG solver for coupled and scalar systems with the following text command and list each supported equation type allowing you to enable/disable GPGPU acceleration, choose between AMG and FGMRES solvers, and specify various solver options.


 GPU acceleration will not be used in the following cases:

  • The population balance model is active.
  • The Eulerian multiphase model is active.
  • The system has more than 5 coupled equations.

Accelerating Discrete Ordinates (DO) Radiation Calculations

The accelerated discrete ordinates (DO) radiation solver is computationally faster than the standard DO solver, especially when used in parallel, although it may take a larger number of iterations to converge.

The solver is based on OpenACC and can run on either architectures: CPUs or GPUs. The solver is currently not compatible with all the models and boundary conditions but is found to be extremely fast where applicable. Cases that need very high resolution in discretizing the radiation intensities benefit the most from this accelerated solver. Head lamp simulation is one such application area where the accelerated solver speeds up the computation by several times.

After you have selected the DO model in the Radiation Model dialog box, you can enable the accelerated DO solver by using the following text command:

/define/models/radiation/do-acceleration yes

If NVIDIA GPUs are enabled in the Fluent session, this solver will accelerate the DO computations by using the GPUs. In the absence of GPUs, this solver can still be used with the CPU cores to accelerate the DO computations. Note that the accelerated DO solver uses the first-order upwind scheme and an explicit relaxation of 1.0.

The accelerated DO solver is incompatible with some models and settings; when necessary, Fluent will automatically revert to the standard DO solver when the calculation is started and print a message about the conflict.

If you plan to use GPUs with the accelerated DO solver, it is recommended that you run NVIDIA’s multi-process server (MPS) before launching ANSYS Fluent using the following command:

nvidia-cuda-mps-control -d

It is known to improve the robustness and performance of the GPU computations with the multiple Fluent processes.​

Accelerating S2S View Factor Calculations

View factor computations can be accelerated through the raytracing_acc utility that uses the NVIDIA Optix library for tracing the rays. The GPU available on the machine running the host process is used in such a scenario, except in a mixed Windows-Linux simulation where the GPU on node-0 is used. An NVIDIA GPU along with CUDA 6.0 is required for using raytracing_acc. At present, this utility is available only on lnamd64 (Red Hat Enterprise Linux 5/6, and SUSE Linux Enterprise Server 11) and win64 (Windows 7) machines for 3D problems. In order to use the utility, the CUDA 6.0 library should be accessible through the appropriate environment variable (LD_LIBRARY_PATH on lnamd64 or %path% on win64).

When using the raytracing_acc utility from outside an ANSYS Fluent session, the command line is

utility raytracing_acc [output_s2s_file(optional)]

When using the raytracing_acc utility from inside an ANSYS Fluent session, use the following text command:



This section provides expected performance benchmarks for different models on CPU and GPU systems.

Ansys Fluent (Cooling Water Jacket) Performance on CPU and GPU Systems
Ansys Fluent (Boeing Landing Gear) Performance on CPU and GPU Systems
Ansys Fluent (Formula-1 Race Car) Performance on CPU and GPU Systems
Ansys Fluent (Open Wheel Race Car) Performance on CPU and GPU Systems
Ansys Fluent (Truck Body) Performance on CPU and GPU Systems
Ansys Fluent (Headlamp D0 Radiation) Performance on CPU and GPU Systems

Recommended System Configurations

Hardware Configuration



CPU Architecture


System Memory



Minimum 500 GB


2 CPU sockets (8+ cores, 2+ GHz)

GPU Model

NVIDIA Quadro ® GP100 for double precision compute


Recommend 1-2 GPU per CPU socket



CPU Architecture


System Memory



2 (8+ cores, 2+ GHz)

Total # of Nodes


GPU Model

NVIDIA Tesla ® P100


Recommend 2 GPU per CPU socket



Build Your Ideal GPU Solution Today