Actions

Running MPI jobs

From ALICE Documentation

Running MPI jobs

One option is to produce the hostfile and feed it directly to the mpirun command of the appropriate MPI distribution. The disadvantage of this approach is that it does not integrate with SLURM and as such it does not provide advanced features such as task affinity, accounting, etc.

Another option is to use process manager built into SLURM and launch the MPI executable through srun command. How to do this for various MPI distributions is described at http://slurm.schedmd.com/mpi_guide.html. Some MPI distributions' mpirun commands integrate with Slurm and thus it is more convenient to use them instead of srun.

For MPI distributions at ALICE, the following works (assuming MPI program internally threaded with OpenMP).

INTEL MPI

module load [intel,gcc] impi
#for a cluster with Ethernet only, set network fabrics to TCP
setenv I_MPI_FABRICS shm:tcp
#for a cluster with InfinBand, set network fabrics to OFA
setenv I_MPI_FABRICS shm:ofa
#IMPI option 1 - launch with PMI library - currently not using task affinity, use mpirun instead
setenv I_MPI_PMI_LIBRARY /uufs/CLUSTER.alice/sys/pkg/slurm/std/lib/libpmi.so
#srun -n $SLURM_NTASKS $EXE >& run1.out
#IMPI option 2 - bootstrap
mpirun -bootstrap slurm -np $SLURM_NTASKS $EXE  >& run1.out

MPICH2

Launch the MPICH2 jobs with mpiexec as explained in http://slurm.schedmd.com/mpi_guide.html#mpich2. That is:

module load [intel,gcc,pgi] mpich2
setenv MPICH_NEMESIS_NETMOD mxm # default is Ethernet, choose mxm for InfiniBand
mpirun -np $SLURM_NTASKS $EXE

OPENMPI

Use the mpirun command from the OpenMPI distribution. There's no need to specify the hostfile as OpenMPI communicates with Slurm in that regard. To run:

module load [intel,gcc,pgi] openmpi
mpirun --mca btl tcp,self -np $SLURM_NTASKS $EXE # in case of Ethernet network cluster, such as general nodes.
mpirun -np $SLURM_NTASKS $EXE # in case of InfiniBand network clusters

Note that OpenMPI supports multiple network interfaces and as such it allows for single MPI executable across all HPC clusters, including the InfiniBand network on CPU nodes.

MVAPICH2

MVAPICH2 executable can be launched with mpirun command (preferably) or with srun, in which case one needs to use --mpi=none flag. To run multi-threaded code, make sure to set OMP_NUM_THREADS and MV2_ENABLE_AFFINITY=0 (ensure that the MPI tasks don't get locked to single core) before calling the srun.

module load [intel,gcc,pgi] mvapich2
setenv OMP_NUM_THREADS 6  # optional number of OpenMP threads
setenv MV2_ENABLE_AFFINITY 0 # disable process affinity - only for multi-threaded programs
mpirun -np $SLURM_NTASKS $EXE # mpirun is recommended
srun -n $SLURM_NTASKS --mpi=none $EXE # srun is optional
x