Actions

Using Node802

From ALICE Documentation

Using Node802

This section provides information about node802 and how to use it.

Important information about access

Node802 is not available to all ALICE users. It was purchased by a research group from the Mathematical Institute and integrated into the ALICE compute environment. Thus, only members of the research group have access to the node.

Overview

Hardware

  • The basic hardware configuration of node802 is available here
  • Hyperthreading is active

Software

  • Because this node uses AMD CPUs, there is a new and separate AMD software stack.
  • The main ALICE software stack was compiled with Intel CPUs and should not be used for jobs on node802.
  • New software can be added to the stack by contacting the ALICE Helpdesk). New software has to be compiled on node802.
  • To get a list of currently available modules in the AMD software stack, use module load ALICE/AMD and then module avail
  • If you want to go back to the Intel software stack, use module load ALICE/Intel
  • For a batch script, you can also use module load ALICE/default which will load the correct software stack based on the CPU architecture of the node.
  • Almost all of the requested software is available. The following packages are missing or not yet fully functional:
    • Magma Computer Algebra is not yet available
    • While GMB is installed, MPIR is not yet available.
    • Julia is installed. However, OSCAR is not yet available due installation issues.

Local node scratch

  • The two 10TB HDDs were combined into a single volume of about 20TB.

Slurm specifics

  • There is a dedicated partition for this machine: mem_mi
  • Access to this machine is regulated via a reservation: mem_mi
    • Only users who have been added to the reservation can use the machine.
    • New members of the research group can be added to the reservation by sending a mail to the ALICE Helpdesk.
    • You can check the list of users like this: scontrol show reservation mem_mi
  • There is a dedicated QOS which currently allows a single user to submit a maximum of 30 jobs. This and other settings can of course be changed.
  • By default, a job will get all the node's memory assigned unless the amount of memory is specified with --mem (see example). This can be changed.
  • There is no time limit on how long a job can run on the node (see sinfo). This can also be changed of course.

About your own scripts/programmes

  • Because a separate software stack is necessary for an AMD machine, you should not compile jobs with GNU-compilers or the likes on the login nodes which are Intel based
  • You should always compile such scripts/software as part of your job
  • If you do not make any changes you can compile your program on node802 the first time you run it as part of a job. In this first job, you copy the compiled program back to your shared storage or home directory. For the next job, you use the already compiled version (see example below).
  • You can still use the login nodes for testing/debugging. In this case, you need to compile on the login nodes, run your test and for your job, compile on the compute node again.

Example

Here is an example of how a Slurm batch script could look like for using the node, including a HelloWorld OpenMP program to demonstrate the compiling and use of the local scratch storage.

If you are new to HPC, ALICE or Slurm, have a look at the User Guides

Batch script

#!/bin/bash
#SBATCH --partition=mem_mi
#SBATCH --reservation=mem_mi
#SBATCH --job-name=test_job
#SBATCH --time=0-00:02:00
#SBATCH --output=%x_%j.out
#SBATCH --nodes=1
#SBATCH --ntasks=5
#SBATCH --cpus-per-task=3
#SBATCH --mem=10G
#SBATCH --mail-user="your-email-address"
#SBATCH --mail-type="ALL"

module load ALICE/default
module load OpenMPI/4.0.5-GCC-9.3.0

echo "#### Test started"

# return the name of the node
echo "## Which node is this: $HOSTNAME"

# check the number of cores (ntasks*cpus-per-task)
echo "How many cores do I have access to: ${SLURM_CPUS_ON_NODE}"

# Just to check that the AMD software stack is loaded
echo "Am I loading the from the right module path"
echo ${MODULEPATH%%:*}

# get the current working directory
CWD=$(pwd)

echo "## Where am I: ${CWD}"

# check out the nodes local scratch
echo "## My local scratch space on the node is: ${SCRATCH}"
cd $SCRATCH

echo "## Let us go there: $(pwd)"

# In case the file has already been compiled
# and stored in $CWD, the following six lines
# are not necessary  
echo "## Let us copy the C script to it"
cp $CWD/omp_hello.c $SCRATCH/  
echo "## Is the file there?"
ls -la omp_hello.c
echo "## Now we compile it on the node"
gcc -o omp_hello_amd -fopenmp omp_hello.c

# In case the file is already compiled
# the next four lines would copy it
# and check that it is there:
#echo "## Let us copy the compiled C programme to it"
#cp $CWD/omp_hello_amd $SCRATCH/
#echo "## Is the file there?"
#ls -la omp_hello_amd

echo "## Let us run it"
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASKS
srun ./omp_hello_amd

# Copy those files back to shared scratch or home
# that should be kept for later.
# Here, it is just the compiled C programme.
# It does not need to be copied back of course
# if it came from shared scratch or home.
echo "## Saving files that should be saved."
cp $SCRATCH/omp_hello_amd $CWD/

echo "## Now that this is done, I want to go home"
cd $CWD
echo "## Good to be back $(pwd)"

echo "#### Test finished"

OpenMP script

Here is the content of the file omp_hello.c from https://computing.llnl.gov/tutorials/openMP/samples/C/omp_hello.c

/******************************************************************************
 * * FILE: omp_hello.c
 * * DESCRIPTION:
 * *   OpenMP Example - Hello World - C/C++ Version
 * *   In this simple example, the master thread forks a parallel region.
 * *   All threads in the team obtain their unique thread number and print it.
 * *   The master thread only prints the total number of threads.  Two OpenMP
 * *   library routines are used to obtain the number of threads and each
 * *   thread's number.
 * * AUTHOR: Blaise Barney  5/99
 * * LAST REVISED: 04/06/05
 * ******************************************************************************/
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])
{
int nthreads, tid;

/* Fork a team of threads giving them their own copies of variables */
#pragma omp parallel private(nthreads, tid)
  {

  /* Obtain thread number */
  tid = omp_get_thread_num();
  printf("Hello World from thread = %d\n", tid);

  /* Only master thread does this */
  if (tid == 0)
    {
    nthreads = omp_get_num_threads();
    printf("Number of threads = %d\n", nthreads);
    }

  }  /* All threads join master thread and disband */

}