For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA Data Center GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.


Detailed H100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.4-AT_23.4

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes10.142895811,2292,408
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x28x57x121x238x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes10.442925931,2392,747
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x28x57x119x263x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes59.601,2462,5085,14911,308
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x21x42x86x190x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes55.291,2922,5885,26311,507
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x23x47x95x208x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes234.704,2808,50417,04334,274
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x18x36x73x146x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes232.724,3448,83117,88836,412
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x19x38x77x156x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes2.9885170341681
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x29x57x114x229x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes25.621953917811,563
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x8x15x31x61x

AMBER is measured by running multiple independent instances using MPS


Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V2023.10

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
ChromaTotal Time (Sec)HMC Mediumno21,752512284169109
ChromaNRFHMC Mediumyes1x43x78x132x204x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

14.0.1

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
Fun3D [dpw_wbt0_crs-3.6Mn_5]Loop Time (Sec)dpw_wbt0_crs-3.6Mn_5no1862716108
Fun3D [dpw_wbt0_crs-3.6Mn_5]NRFdpw_wbt0_crs-3.6Mn_5yes1x9x15x25x32x
Fun3D [waverider-5M]Loop Time (Sec)waverider-5Mno25740221410
Fun3D [waverider-5M]NRFwaverider-5Myes1x10x18x30x42x
Fun3D [waverider-5M w/chemistry]Loop Time (Sec)waverider-5M w/chemistryno735116623623
Fun3D [waverider-5M w/chemistry]NRFwaverider-5M w/chemistryyes1x9x17x30x47x
Fun3D [waverider-20M]Loop Time (Sec)waverider-20Mno1,088--4628
Fun3D [waverider-20M]NRFwaverider-20Myes1x--31x52x
Fun3D [waverider-20M w/chemistry]Loop Time (Sec)waverider-20M w/chemistryno3,117--14080
Fun3D [waverider-20M w/chemistry]NRFwaverider-20M w/chemistryyes1x--33x57x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2023.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
GROMACS [ADH Dodec]ns/dayADH Dodecyes1658391,5212,5555,213
GROMACS [ADH Dodec]NRFADH Dodecyes1x5x9x15x32x
GROMACS [Cellulose]ns/dayCelluloseyes60189262356-
GROMACS [Cellulose]NRFCelluloseyes1x3x4x6x-
GROMACS [STMV]ns/daySTMVyes134373123177
GROMACS [STMV]NRFSTMVyes1x3x7x12x18x

GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS


GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V4.5

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
GTCMpush/Secmpi#proc.inyes897681,4222,7805,196
GTCNRFmpi#proc.inyes1x9x17x33x62x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.7_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno96418714110893
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x5x7x9x10x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)QUBICC 160 km resolutionno7981741229277
ICON [QUBICC 160 km resolution]NRFQUBICC 160 km resolutionyes1x5x7x9x10x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

Stable_2Aug2023

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes3.51E+081.29E+092.33E+094.19E+097.03E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x4x7x13x21x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes1.85E+085.19E+089.63E+081.74E+092.84E+09
LAMMPS [EAM]NRFEAMyes1x3x5x10x16x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes1.37E+061.05E+071.91E+073.12E+074.75E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x11x21x34x52x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes5.09E+053.90E+067.76E+061.53E+072.91E+07
LAMMPS [SNAP]NRFSNAPyes1x9x18x35x66x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes1.03E+089.93E+081.80E+093.30E+095.53E+09
LAMMPS [Tersoff]NRFTersoffyes1x11x21x38x63x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_a2f9e61

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
MILCTotal Time (sec)Apex Mediumno31,5771,163623355215
MILCNRFApex Mediumyes1x24x45x79x130x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

3.b04

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
NAMD [apoa1_npt_cuda]ns/dayapoa1_npt_cudayes64.492995961,1812,300
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x5x9x18x36x
NAMD [apoa1_nptsr_cuda]ns/dayapoa1_nptsr_cudayes65.193066121,2122,364
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x5x9x19x36x
NAMD [apoa1_nve_cuda]ns/dayapoa1_nve_cudayes71.143777591,4902,910
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x5x11x21x41x
NAMD [stmv_npt_cuda]ns/daystmv_npt_cudayes6.582551101203
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x8x15x31x
NAMD [stmv_nptsr_cuda]ns/daystmv_nptsr_cudayes6.712652104208
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x4x8x16x31x
NAMD [stmv_nve_cuda]ns/daystmv_nve_cudayes6.973161123245
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x4x9x18x35x

NAMD is measured by running multiple independent instances using MPS


Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V7.2

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno23861402924
Quantum EspresssoNRFAUSURF112-jRyes1x7x11x15x18x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2023_03

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
RTM [Isotropic Radius 4]Mcell/sIsotropic Radius 4yes19,173156,724312,444623,3341,246,488
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x8x16x33x65x
RTM [TTI Radius 8 1-pass]Mcell/sTTI Radius 8 1-passyes6,39122,65245,20390,146179,781
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x4x7x14x28x
RTM [TTI RX 2Pass mgpu]Mcell/sTTI RX 2Pass mgpuyes6,39121,77943,46686,534172,492
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x3x7x14x27x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_d2105bb

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
SPECFEM3DTotal Time (Sec)four_material_simple_modelno38646241410
SPECFEM3DNRFfour_material_simple_modelyes1x10x18x32x45x


Detailed L40S application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.4-AT_23.4

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x L40S2x L40S4x L40S8x L40S
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes10.141753527131,459
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x17x35x70x144x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes10.441793587251,514
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x17x34x69x145x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes59.609831,9723,9948,259
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x16x33x67x139x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes55.291,0072,0094,0588,532
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x18x36x73x154x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes234.704,0378,08116,43632,051
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x17x34x70x137x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes232.724,0658,25916,72833,529
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x17x35x72x144x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes2.9891183366732
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x31x61x123x246x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes25.621913817631,526
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x7x15x30x60x

AMBER is measured by running multiple independent instances using MPS


Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V2023.10

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x L40S2x L40S4x L40S8x L40S
ChromaTotal Time (Sec)HMC Mediumno21,7527,7911,002550402
ChromaNRFHMC Mediumyes1x3x22x40x55x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

14.0.1

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

0
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
2x L40S4x L40S8x L40S
Fun3D [dpw_wbt0_crs-3.6Mn_5]Loop Time (Sec)dpw_wbt0_crs-3.6Mn_5no186663419
Fun3D [dpw_wbt0_crs-3.6Mn_5]NRFdpw_wbt0_crs-3.6Mn_5yes1x4x7x13x
Fun3D [waverider-5M]Loop Time (Sec)waverider-5Mno257854525
Fun3D [waverider-5M]NRFwaverider-5Myes1x5x9x16x
Fun3D [waverider-5M w/chemistry]Loop Time (Sec)waverider-5M w/chemistryno73524112770
Fun3D [waverider-5M w/chemistry]NRFwaverider-5M w/chemistryyes1x4x9x15x
Fun3D [waverider-20M]Loop Time (Sec)waverider-20Mno1,088-17897
Fun3D [waverider-20M]NRFwaverider-20Myes1x-8x15x
Fun3D [waverider-20M w/chemistry]Loop Time (Sec)waverider-20M w/chemistryno3,117--295
Fun3D [waverider-20M w/chemistry]NRFwaverider-20M w/chemistryyes1x--15x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2023.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x L40S2x L40S4x L40S8x L40S
GROMACS [ADH Dodec]ns/dayADH Dodecyes1657121,5512,6585,234
GROMACS [ADH Dodec]NRFADH Dodecyes1x4x9x16x32x
GROMACS [Cellulose]ns/dayCelluloseyes60179213--
GROMACS [Cellulose]NRFCelluloseyes1x3x4x--
GROMACS [STMV]ns/daySTMVyes134370100104
GROMACS [STMV]NRFSTMVyes1x3x6x10x11x

GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS


GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V4.5

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x L40S2x L40S4x L40S8x L40S
GTCMpush/Secmpi#proc.inyes894428091,5913,066
GTCNRFmpi#proc.inyes1x5x10x19x37x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.7_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x L40S2x L40S4x L40S
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno964496287167
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x2x3x6x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)QUBICC 160 km resolutionno798498280165
ICON [QUBICC 160 km resolution]NRFQUBICC 160 km resolutionyes1x2x3x5x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

Stable_2Aug2023

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
2x L40S4x L40S8x L40S
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes1.37E+063.13E+065.68E+068.32E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x3x5x9x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes5.09E+05-2.35E+064.64E+06
LAMMPS [SNAP]NRFSNAPyes1x-6x11x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes1.03E+08-4.88E+089.46E+08
LAMMPS [Tersoff]NRFTersoffyes1x-5x11x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_a2f9e61

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x L40S2x L40S4x L40S8x L40S
MILCTotal Time (sec)Apex Mediumno31,5774,0472,0511,3431,773
MILCNRFApex Mediumyes1x7x14x21x16x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

3.b04

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x L40S2x L40S4x L40S8x L40S
NAMD [apoa1_npt_cuda]ns/dayapoa1_npt_cudayes64.492304579001,816
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x4x7x14x28x
NAMD [apoa1_nptsr_cuda]ns/dayapoa1_nptsr_cudayes65.192304579101,803
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x4x7x14x28x
NAMD [apoa1_nve_cuda]ns/dayapoa1_nve_cudayes71.142996021,1932,369
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x4x8x17x33x
NAMD [stmv_npt_cuda]ns/daystmv_npt_cudayes6.58173467135
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x3x5x10x20x
NAMD [stmv_nptsr_cuda]ns/daystmv_nptsr_cudayes6.71173570139
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x3x5x10x21x
NAMD [stmv_nve_cuda]ns/daystmv_nve_cudayes6.97224488176
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x3x6x13x25x

NAMD is measured by running multiple independent instances using MPS


RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2023_03

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x L40S2x L40S4x L40S8x L40S
RTM [Isotropic Radius 4]Mcell/sIsotropic Radius 4yes19,17342,36884,434168,107336,182
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x2x4x9x18x
RTM [TTI Radius 8 1-pass]Mcell/sTTI Radius 8 1-passyes6,39114,32228,30055,928111,681
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x9x17x
RTM [TTI RX 2Pass mgpu]Mcell/sTTI RX 2Pass mgpuyes6,391--31,43762,801
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x--5x10x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_d2105bb

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x L40S2x L40S4x L40S8x L40S
SPECFEM3DTotal Time (Sec)four_material_simple_modelno386171864423
SPECFEM3DNRFfour_material_simple_modelyes1x2x4x10x19x


Detailed L4 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.4-AT_23.4

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x L4 2x L44x L48x L4
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes10.1452106212426
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x5x10x21x42x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes10.4454108215433
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x5x10x21x41x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes59.602595191,0392,142
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x4x9x17x36x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes55.292655331,0662,132
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x5x10x19x39x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes234.701,2252,4504,9319,899
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x5x10x21x42x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes232.721,2412,4815,01810,161
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x5x11x22x44x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes2.98204081162
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x7x14x27x54x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes25.62114227455910
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x4x9x18x36x

AMBER is measured by running multiple independent instances using MPS


Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V2023.10

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
2x L44x L48x L4
ChromaTotal Time (Sec)HMC Mediumno21,7526,4261,452790
ChromaNRFHMC Mediumyes1x3x15x28x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2023.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
4x L48x L4
GROMACS [ADH Dodec]ns/dayADH Dodecyes1658091,665
GROMACS [ADH Dodec]NRFADH Dodecyes1x5x10x

GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS


GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V4.5

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x L4 2x L44x L48x L4
GTCMpush/Secmpi#proc.inyes891843346641,234
GTCNRFmpi#proc.inyes1x2x4x8x15x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_a2f9e61

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
2x L44x L48x L4
MILCTotal Time (sec)Apex Mediumno31,5775,8753,0021,587
MILCNRFApex Mediumyes1x5x9x18x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

3.b04

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
4x L48x L4
NAMD [apoa1_nve_cuda]ns/dayapoa1_nve_cudayes71.14357713
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x5x10x

NAMD is measured by running multiple independent instances using MPS



Detailed A100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.4-AT_23.4

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes10.141823687411,4701743366961,414
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x18x36x73x145x17x33x69x139x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes10.441863707501,4851773437121,460
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x18x35x72x142x17x33x68x140x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes59.607981,6063,2216,3797711,4873,0626,349
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x13x27x54x107x13x25x51x107x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes55.298171,6303,2466,5597901,5773,1816,655
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x15x29x59x119x14x29x58x120x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes234.702,8885,73811,50323,8682,8755,58211,50623,739
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x24x49x102x12x24x49x101x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes232.722,9665,87711,72123,5152,8665,91211,89124,101
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x13x25x50x101x12x25x51x104x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes2.985310721442752105210420
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x18x36x72x143x18x35x70x141x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes25.621342695371,0741352705391,078
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x5x10x21x42x5x11x21x42x

AMBER is measured by running multiple independent instances using MPS


Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V2023.10

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
ChromaTotal Time (Sec)HMC Mediumno21,752893488280176922540385300
ChromaNRFHMC Mediumyes1x25x45x79x126x24x41x58x74x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

14.0.1

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
Fun3D [dpw_wbt0_crs-3.6Mn_5]Loop Time (Sec)dpw_wbt0_crs-3.6Mn_5no1864826161148261512
Fun3D [dpw_wbt0_crs-3.6Mn_5]NRFdpw_wbt0_crs-3.6Mn_5yes1x5x9x15x21x5x9x16x21x
Fun3D [waverider-5M]Loop Time (Sec)waverider-5Mno2577240231672392216
Fun3D [waverider-5M]NRFwaverider-5Myes1x6x10x17x26x6x10x18x26x
Fun3D [waverider-5M w/chemistry]Loop Time (Sec)waverider-5M w/chemistryno73519510357351991035735
Fun3D [waverider-5M w/chemistry]NRFwaverider-5M w/chemistryyes1x6x11x19x31x5x10x19x31x
Fun3D [waverider-20M]Loop Time (Sec)waverider-20Mno1,088--8146--8147
Fun3D [waverider-20M]NRFwaverider-20Myes1x--18x31x--18x31x
Fun3D [waverider-20M w/chemistry]Loop Time (Sec)waverider-20M w/chemistryno3,117--228124---128
Fun3D [waverider-20M w/chemistry]NRFwaverider-20M w/chemistryyes1x--20x37x---36x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2023.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes1654929731,7473,6714829131,3693,588
GROMACS [ADH Dodec]NRFADH Dodecyes1x3x6x11x22x3x6x8x22x
GROMACS [STMV]ns/daySTMVyes13244481129233967-
GROMACS [STMV]NRFSTMVyes1x2x4x8x13x2x3x6x-

GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS


GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V4.5

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GTCMpush/Secmpi#proc.inyes894899061,7873,3274908881,7262,796
GTCNRFmpi#proc.inyes1x6x11x21x40x6x11x21x34x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.7_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno964284201149292207155
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x5x6x3x5x6x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)QUBICC 160 km resolutionno798268180131269179129
ICON [QUBICC 160 km resolution]NRFQUBICC 160 km resolutionyes1x3x4x6x3x4x6x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

Stable_2Aug2023

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes3.51E+086.96E+081.30E+092.34E+094.05E+096.65E+081.21E+091.99E+09-
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x2x4x7x12x2x3x6x-
LAMMPS [EAM]ATOM-Time Steps/sEAMyes1.85E+083.01E+085.62E+081.01E+091.67E+092.92E+085.41E+089.28E+08-
LAMMPS [EAM]NRFEAMyes1x2x3x6x10x2x3x5x-
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes1.37E+065.66E+061.05E+071.76E+072.76E+075.70E+061.01E+071.70E+071.95E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x5x11x19x30x5x11x18x21x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes5.09E+052.21E+064.40E+068.73E+061.66E+072.08E+064.21E+068.21E+061.58E+07
LAMMPS [SNAP]NRFSNAPyes1x6x10x20x38x5x10x19x36x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes1.03E+085.49E+081.02E+091.83E+093.16E+095.29E+089.28E+081.51E+091.58E+09
LAMMPS [Tersoff]NRFTersoffyes1x6x12x21x36x5x11x17x18x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_a2f9e61

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB
MILC Total Time (sec) Apex Medium no 31,577 2,035 1,188 625 358 2,090 1,119 612
MILC NRF Apex Medium yes 1x 14x 24x 45x 78x 13x 25x 46x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

3.b04

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
NAMD [apoa1_npt_cuda]ns/dayapoa1_npt_cudayes64.491763506871,3811733406871,370
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x3x5x11x21x3x5x11x21x
NAMD [apoa1_nptsr_cuda]ns/dayapoa1_nptsr_cudayes65.191803587141,4211783507041,409
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x3x5x11x22x3x5x11x22x
NAMD [apoa1_nve_cuda]ns/dayapoa1_nve_cudayes71.142204368671,7342164248501,705
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x3x6x12x24x3x6x12x24x
NAMD [stmv_npt_cuda]ns/daystmv_npt_cudayes6.58152958116142857114
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x2x4x9x18x2x4x9x17x
NAMD [stmv_nptsr_cuda]ns/daystmv_nptsr_cudayes6.71153059119152958117
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x2x4x9x18x2x4x9x17x
NAMD [stmv_nve_cuda]ns/daystmv_nve_cudayes6.97173469137173367135
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x2x5x10x20x2x5x10x19x

NAMD is measured by running multiple independent instances using MPS


Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V7.2

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno238107674331110654535
Quantum EspresssoNRFAUSURF112-jRyes1x4x6x10x14x4x7x10x12x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2023_03

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
RTM [Isotropic Radius 4]Mcell/sIsotropic Radius 4yes19,17391,805182,810364,439727,71587,309173,842346,547693,278
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x5x10x19x38x5x9x18x36x
RTM [TTI Radius 8 1-pass]Mcell/sTTI Radius 8 1-passyes6,39113,06726,04551,877103,47712,90425,38350,754101,508
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x8x16x2x4x8x16x
RTM [TTI RX 2Pass mgpu]Mcell/sTTI RX 2Pass mgpuyes6,39113,07425,97851,814103,44612,21423,65347,70396,014
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x8x16x2x4x7x15x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_d2105bb

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A100 SXM4 80GB2x A100 SXM4 80GB4x A100 SXM4 80GB8x A100 SXM4 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno3867740211379412215
SPECFEM3DNRFfour_material_simple_modelyes1x4x11x20x33x4x11x20x30x


Detailed A30 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.4-AT_23.4

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A302x A304x A308x A30
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes10.1489180356732
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x9x18x35x72x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes10.4491183365743
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x9x18x35x71x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes59.604088221,6253,334
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x7x14x27x56x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes55.294158411,6693,401
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x8x15x30x62x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes234.701,5143,0055,97412,461
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x6x13x25x53x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes232.721,5403,0536,17112,483
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x7x13x27x54x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes2.982958116231
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x10x19x39x78x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes25.6297194388775
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x4x8x15x30x

AMBER is measured by running multiple independent instances using MPS


Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V2023.10

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
2x A304x A308x A30
ChromaTotal Time (Sec)HMC Mediumno21,7524,732615440
ChromaNRFHMC Mediumyes1x5x36x50x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

14.0.1

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A302x A304x A308x A30
Fun3D [dpw_wbt0_crs-3.6Mn_5]Loop Time (Sec)dpw_wbt0_crs-3.6Mn_5no18697492617
Fun3D [dpw_wbt0_crs-3.6Mn_5]NRFdpw_wbt0_crs-3.6Mn_5yes1x2x5x9x14x
Fun3D [waverider-5M]Loop Time (Sec)waverider-5Mno257142733924
Fun3D [waverider-5M]NRFwaverider-5Myes1x2x6x10x17x
Fun3D [waverider-5M w/chemistry]Loop Time (Sec)waverider-5M w/chemistryno735-20110659
Fun3D [waverider-5M w/chemistry]NRFwaverider-5M w/chemistryyes1x-5x10x18x
Fun3D [waverider-20M]Loop Time (Sec)waverider-20Mno1,088--15283
Fun3D [waverider-20M]NRFwaverider-20Myes1x--10x18x
Fun3D [waverider-20M w/chemistry]Loop Time (Sec)waverider-20M w/chemistryno3,117---235
Fun3D [waverider-20M w/chemistry]NRFwaverider-20M w/chemistryyes1x---19x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V4.5

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A302x A304x A308x A30
GTCMpush/Secmpi#proc.inyes892875331,0501,872
GTCNRFmpi#proc.inyes1x3x6x13x22x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.7_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A302x A304x A30
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno964508321212
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x2x3x5x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)QUBICC 160 km resolutionno798466279177
ICON [QUBICC 160 km resolution]NRFQUBICC 160 km resolutionyes1x2x3x5x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

Stable_2Aug2023

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A302x A304x A308x A30
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes1.37E+063.09E+065.86E+061.04E+071.40E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x3x5x11x15x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes5.09E+051.12E+062.23E+064.43E+068.55E+06
LAMMPS [SNAP]NRFSNAPyes1x2x6x10x19x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes1.03E+082.67E+084.97E+088.58E+081.10E+09
LAMMPS [Tersoff]NRFTersoffyes1x3x5x10x13x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_a2f9e61

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A302x A304x A308x A30
MILCTotal Time (sec)Apex Mediumno31,5774,7012,0301,084713
MILCNRFApex Mediumyes1x6x14x26x39x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

3.b04

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
2x A304x A308x A30
NAMD [apoa1_npt_cuda]ns/dayapoa1_npt_cudayes64.49183367728
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x3x6x11x
NAMD [apoa1_nptsr_cuda]ns/dayapoa1_nptsr_cudayes65.19188376746
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x3x6x11x
NAMD [apoa1_nve_cuda]ns/dayapoa1_nve_cudayes71.14222445886
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x3x6x12x
NAMD [stmv_npt_cuda]ns/daystmv_npt_cudayes6.58-3061
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x-5x9x
NAMD [stmv_nptsr_cuda]ns/daystmv_nptsr_cudayes6.71-3162
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x-5x9x
NAMD [stmv_nve_cuda]ns/daystmv_nve_cudayes6.97-3469
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x-5x10x

NAMD is measured by running multiple independent instances using MPS


Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V7.2

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
2x A304x A308x A30
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno2381066445
Quantum EspresssoNRFAUSURF112-jRyes1x4x7x10x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2023_03

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A302x A304x A308x A30
RTM [Isotropic Radius 4]Mcell/sIsotropic Radius 4yes19,17345,70491,083181,345362,678
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x2x5x9x19x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_d2105bb

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A302x A304x A308x A30
SPECFEM3DTotal Time (Sec)four_material_simple_modelno386159814223
SPECFEM3DNRFfour_material_simple_modelyes1x2x4x10x19x


Detailed A40 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.4-AT_23.4

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A402x A404x A408x A40
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes10.1497197397819
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x10x19x39x81x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes10.4499200403839
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x9x19x39x80x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes59.604919961,9884,028
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x8x17x33x68x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes55.295031,0142,0404,248
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x9x18x37x77x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes234.701,9253,8847,76916,230
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x8x17x33x69x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes232.721,9523,9778,03716,580
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x8x17x35x71x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes2.983263127254
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x11x21x43x85x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes25.62119238475950
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x5x9x19x37x

AMBER is measured by running multiple independent instances using MPS


Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V2023.10

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A402x A404x A408x A40
ChromaTotal Time (Sec)HMC Mediumno21,7526,8961,391774540
ChromaNRFHMC Mediumyes1x3x16x29x41x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

14.0.1

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
4x A408x A40
Fun3D [waverider-5M]Loop Time (Sec)waverider-5Mno2577641
Fun3D [waverider-5M]NRFwaverider-5Myes1x5x10x
Fun3D [waverider-20M]Loop Time (Sec)waverider-20Mno1,088-162
Fun3D [waverider-20M]NRFwaverider-20Myes1x-9x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2023.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A402x A404x A408x A40
GROMACS [ADH Dodec]ns/dayADH Dodecyes1653486521,1102,586
GROMACS [ADH Dodec]NRFADH Dodecyes1x2x4x7x16x
GROMACS [STMV]ns/daySTMVyes13203759-
GROMACS [STMV]NRFSTMVyes1x2x3x5x-

GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS


GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V4.5

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A402x A404x A408x A40
GTCMpush/Secmpi#proc.inyes893035531,0881,926
GTCNRFmpi#proc.inyes1x3x7x13x23x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_a2f9e61

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A402x A404x A408x A40
MILCTotal Time (sec)Apex Mediumno31,5776,0053,0941,7011,034
MILCNRFApex Mediumyes1x5x9x17x27x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

3.b04

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A402x A404x A408x A40
NAMD [apoa1_npt_cuda]ns/dayapoa1_npt_cudayes64.49105211423845
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x2x3x7x13x
NAMD [apoa1_nptsr_cuda]ns/dayapoa1_nptsr_cudayes65.19109221441885
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x2x3x7x14x
NAMD [apoa1_nve_cuda]ns/dayapoa1_nve_cudayes71.141462955931,187
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x2x4x8x17x
NAMD [stmv_npt_cuda]ns/daystmv_npt_cudayes6.58--3265
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x--5x10x
NAMD [stmv_nptsr_cuda]ns/daystmv_nptsr_cudayes6.71-173468
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x-3x5x10x
NAMD [stmv_nve_cuda]ns/daystmv_nve_cudayes6.9711214285
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x2x3x6x12x

NAMD is measured by running multiple independent instances using MPS


RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2023_03

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A402x A404x A408x A40
RTM [Isotropic Radius 4]Mcell/sIsotropic Radius 4yes19,17330,64561,112121,867243,733
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x2x3x6x13x
RTM [TTI Radius 8 1-pass]Mcell/sTTI Radius 8 1-passyes6,391-18,31236,30872,327
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x-3x6x11x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_d2105bb

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Xeon Platinum 8480C+
(CPU-Only)
1x A402x A404x A408x A40
SPECFEM3DTotal Time (Sec)four_material_simple_modelno3862031035334
SPECFEM3DNRFfour_material_simple_modelyes1x2x3x8x13x