TESLA

Subscribe
GPU-ACCELERATED NAMD
The fastest, easiest way to improve simulations performance by up to 7x.
GPU ACCELERATED NAMD

NAMD Running Instructions

Before running a GPU-accelerated version of NAMD, install the latest NVIDIA display driver for your GPU. To run NAMD, you need ‘namd2’ executable and for multi-node run you also need the charmrun executable (see download and installation instructions). Make sure to specify CPU affinity options as explained below.

Command Line Options to Run NAMD

General command line to run NAMD on a single-node system:

namd2 {namdOpts} {inputFile}

On a multi-node system NAMD has to be run with charmrun as specified below:

charmrun {charmOpts} namd2 {namdOpts} {inputFile}

{charmOpts}:

  • ++nodelist {nodeListFile} - multi-node runs require a list of nodes
  • Charm++ also supports an alternative ++mpiexec option if you're using a queueing system that mpiexec is setup to recognize.
  • ++p $totalPes - specifies the total number of PE threads
  • This is the total number of Worker Threads (aka PE threads). We recommend this to be equal to (#TotalCPUCores - #TotalGPUs).
  • ++ppn $pesPerProcess - number of PEs per process
  • We recommend to set this to #ofCoresPerNode/#ofGPUsPerNode – 1
    • This is necessary to free one of the threads per process for communication. Make sure to specify +commap below.
    • Total number of processes is equal to $totalPes/$pesPerProcess
    • When using the recommended value for this option, each process will use a single GPU

{namdOpts}:

  • NAMD will inherit '++p' and '++ppn' as '+p' and '+ppn' if set in {charmOpts}
  • Otherwise, for the multi-core build use '+p' to set to the number of cores.
  • It is recommended to have no more than one process per GPU in the multi-node run. To get more communication threads, it is recommended to launch exactly one process per GPU. For single-node it is fine to use multiple GPUs per process.
  • CPU affinity options (see user guide):
    • '+setcpuaffinity' in order to keep threads from moving about
    • '+pemap #-#' - this maps computational threads to CPU cores
    • '+commap #-#' - this sets range for communication threads
    • Example for dual-socket configuration with 16 cores per socket:
      • +setcpuaffinity +pemap 1-15,17-31 +commap 0,16
  • GPU options (see user guide):
    • '+devices {CUDA IDs}' - optionally specify device IDs to use in NAMD
    • If devices are not in socket order it might be useful to set this option to ensure that sockets use their directly-attached GPUs, for example, '+devices 2,3,0,1'

We recommend to always check the startup messages in NAMD to make sure the options are set correctly. Additionally, ++verbose option can provide a more detailed output for the execution for runs that use charmrun. Running top or other system tools can help you make sure you’re getting the requested thread mapping.

{inputFile}
Use corresponding *.namd input file from one of the datasets in the next sub-section.

Example 1. Run ApoA1 on 1 node with 2xGPU and 2xCPU (20 cores total) and multi-core NAMD build:

./namd2 +p 20 +devices 0,1 apoa1.namd

Example 2. Run STMV on 2 nodes, each node with 2xGPU and 2xCPU (20 cores) and SMP NAMD build (note that we launch 4 processes, each controlling 1 GPU):

charmrun ++p 36 ++ppn 9 ./namd2 ++nodelist $NAMD_NODELIST +setcpuaffinity +pemap 1-9,11-19 +commap 0,10 +devices 0,1 stmv.namd

Note that by default, the "rsh" command is used to start namd2 on each node specified in the nodelist file. You can change this via the CONV_RSH environment variable, i.e., to use ssh instead of rsh run "export CONV_RSH=ssh" (see NAMD release notes for details).

See how your Solution
Stacks Up.