Actions

SLURM-Memory

From ALICE Documentation

Revision as of 13:35, 7 April 2020 by Dijkbvan (talk | contribs) (Created page with "===Memory=== All programs require a certain amount of memory to function properly. To see how much memory your program needs, you can check the documentation or run it in an i...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Memory

All programs require a certain amount of memory to function properly. To see how much memory your program needs, you can check the documentation or run it in an interactive session and use the top command to profile it. To specify the memory for your job, use the mem-per-cpu option.

--mem-per-cpu=<number>

Where <number> is memory per processor core. The default is 1GB.

Walltime

If you do not define how long your job will run, it will default to 30 minutes. The maximum walltime that is available depends on the partition that you use.

To specify the walltime for your job, use the time option.

--time=<hh:mm:ss>

Here, <hh:mm:ss> represents hours, minutes and seconds requested. If a job does not complete within the runtime specified in the script, it will terminate.

GPU's

Some programs can take advantage of the unique hardware architecture in a graphics processing unit (GPU). You have to check your documentation for compatibility. A certain number of nodes on the ALICE cluster are equipped with multiple GPUs on each of them (see the hardware description). We strongly recommend that you always specify how many GPUs you will need for your job. This way, slurm can schedule other jobs on the node which will use the remaining GPUs.

To request a node with GPUs, choose one of the gpu partitions and add one of the following lines to your script:

--gres=gpu:<number>

or

--gres=gpu:<GPU_type>:<number>

where:

  • <number> is the number of GPUs per node requested.
  • <GPU_type> is one of the following: 2080ti

Just like for using CPUs, you can specify the memory that you need on the GPU with

 --mem-per-gpu=<number>

Network/Cluster

Some programs solve problems that can be broken up into pieces and distributed across multiple computers that communicate over a network. This strategy often delivers greater performance. ALICE has compute nodes on two separate networks, Infiniband (100Gbps). To see these performance increases, your application or code must be specifically designed to take advantage of these low latency networks.

To request a specific network, you can add the following line to your resource request:

--constraint=<network>

where <network> is IB.

Other

Besides the network a compute node lives on, there may be other features about it that you might need to specify for your program to run efficiently. Below is a table of some commonly requested node attributes that can be defined within the constraints of the sbatch and salloc commands.

Constraint What It Does
avx/avx2 Advanced Vector eXtensions, optimized math operations
Xeon Request compute nodes with Intel Xeon processors
Opteron Request compute nodes with AMD Opteron processors

Note: ALICE currently has avx/avx2 only.