Difference between revisions of "Running MPI jobs"
From ALICE Documentation
(Created page with "==Running MPI jobs== One option is to produce the hostfile and feed it directly to the mpirun command of the appropriate MPI distribution. The disadvantage of this approach i...") |
|||
Line 13: | Line 13: | ||
#for a cluster with InfinBand, set network fabrics to OFA | #for a cluster with InfinBand, set network fabrics to OFA | ||
setenv I_MPI_FABRICS shm:ofa | setenv I_MPI_FABRICS shm:ofa | ||
− | |||
#IMPI option 1 - launch with PMI library - currently not using task affinity, use mpirun instead | #IMPI option 1 - launch with PMI library - currently not using task affinity, use mpirun instead | ||
setenv I_MPI_PMI_LIBRARY /uufs/CLUSTER.alice/sys/pkg/slurm/std/lib/libpmi.so | setenv I_MPI_PMI_LIBRARY /uufs/CLUSTER.alice/sys/pkg/slurm/std/lib/libpmi.so | ||
Line 29: | Line 28: | ||
Use the mpirun command from the OpenMPI distribution. There's no need to specify the hostfile as OpenMPI communicates with Slurm in that regard. To run: | Use the mpirun command from the OpenMPI distribution. There's no need to specify the hostfile as OpenMPI communicates with Slurm in that regard. To run: | ||
module load [intel,gcc,pgi] openmpi | module load [intel,gcc,pgi] openmpi | ||
− | mpirun --mca btl tcp,self -np $SLURM_NTASKS $EXE # in case of Ethernet network cluster, such as general | + | mpirun --mca btl tcp,self -np $SLURM_NTASKS $EXE # in case of Ethernet network cluster, such as general nodes. |
mpirun -np $SLURM_NTASKS $EXE # in case of InfiniBand network clusters | mpirun -np $SLURM_NTASKS $EXE # in case of InfiniBand network clusters | ||
− | Note that OpenMPI supports multiple network interfaces and as such it allows for single MPI executable across all | + | Note that OpenMPI supports multiple network interfaces and as such it allows for single MPI executable across all HPC clusters, including the InfiniBand network on CPU nodes. |
===MVAPICH2=== | ===MVAPICH2=== |
Latest revision as of 12:11, 16 April 2020
Running MPI jobs
One option is to produce the hostfile and feed it directly to the mpirun command of the appropriate MPI distribution. The disadvantage of this approach is that it does not integrate with SLURM and as such it does not provide advanced features such as task affinity, accounting, etc.
Another option is to use process manager built into SLURM and launch the MPI executable through srun command. How to do this for various MPI distributions is described at http://slurm.schedmd.com/mpi_guide.html. Some MPI distributions' mpirun commands integrate with Slurm and thus it is more convenient to use them instead of srun.
For MPI distributions at ALICE, the following works (assuming MPI program internally threaded with OpenMP).
INTEL MPI
module load [intel,gcc] impi #for a cluster with Ethernet only, set network fabrics to TCP setenv I_MPI_FABRICS shm:tcp #for a cluster with InfinBand, set network fabrics to OFA setenv I_MPI_FABRICS shm:ofa #IMPI option 1 - launch with PMI library - currently not using task affinity, use mpirun instead setenv I_MPI_PMI_LIBRARY /uufs/CLUSTER.alice/sys/pkg/slurm/std/lib/libpmi.so #srun -n $SLURM_NTASKS $EXE >& run1.out #IMPI option 2 - bootstrap mpirun -bootstrap slurm -np $SLURM_NTASKS $EXE >& run1.out
MPICH2
Launch the MPICH2 jobs with mpiexec as explained in http://slurm.schedmd.com/mpi_guide.html#mpich2. That is:
module load [intel,gcc,pgi] mpich2 setenv MPICH_NEMESIS_NETMOD mxm # default is Ethernet, choose mxm for InfiniBand mpirun -np $SLURM_NTASKS $EXE
OPENMPI
Use the mpirun command from the OpenMPI distribution. There's no need to specify the hostfile as OpenMPI communicates with Slurm in that regard. To run:
module load [intel,gcc,pgi] openmpi mpirun --mca btl tcp,self -np $SLURM_NTASKS $EXE # in case of Ethernet network cluster, such as general nodes. mpirun -np $SLURM_NTASKS $EXE # in case of InfiniBand network clusters
Note that OpenMPI supports multiple network interfaces and as such it allows for single MPI executable across all HPC clusters, including the InfiniBand network on CPU nodes.
MVAPICH2
MVAPICH2 executable can be launched with mpirun command (preferably) or with srun, in which case one needs to use --mpi=none flag. To run multi-threaded code, make sure to set OMP_NUM_THREADS and MV2_ENABLE_AFFINITY=0 (ensure that the MPI tasks don't get locked to single core) before calling the srun.
module load [intel,gcc,pgi] mvapich2 setenv OMP_NUM_THREADS 6 # optional number of OpenMP threads setenv MV2_ENABLE_AFFINITY 0 # disable process affinity - only for multi-threaded programs mpirun -np $SLURM_NTASKS $EXE # mpirun is recommended srun -n $SLURM_NTASKS --mpi=none $EXE # srun is optional x