From ALICE Documentation

Revision as of 07:53, 3 April 2020 by Dijkbvan (talk | contribs)

This wiki is the main documentation source for information about the Academic Leiden Interdisciplinary Cluster Environment (ALICE) cluster. ALICE is the research computing facility of the partnership Leiden University and Leiden University Medical Center. It is available to any researcher from both partners. In this wiki, we will introduce the ALICE cluster in detail. We assume you know your way around in a terminal session and that you will responsibly use the cluster and respect the available resources. Remember that a cluster runs many different processes submitted by many different users. If you are not sure what you are doing, ask your administrator before submitting very intensive jobs.


High-Performance Computing (HPC) previously the domain of theoretical scientists and computer and software developers is becoming ever more important as a research tool in many research areas. An HPC facility, providing serious computational capabilities, combined with easy and flexible local access, is a strong advantage for these research areas. ALICE is the HPC facility that answers those needs for Leiden University (LU) and Leiden University Medical Center (LUMC). It is available to all researchers and students from both LU and LUMC.

The ALICE facility currently implemented is a first phase edition of what will be a larger Hybrid HPC facility for research, exceeding the capabilities of what individual institutes can build and will provide a stepping stone to the larger national facilities. Although the current implementation is located at two data centres (LU and LUMC) it is one.

The facility aims to be an easily accessible, easily usable system with extensive local support at all levels of expertise. Given the expected diverse use, diversity is implemented in all aspects of computing, namely: the number of CPU's, GPU's and the ratio of these two numbers; the size of the core memory to the CPU's; the data storage size and location; and the speed of the network.

ALICE provides not only a sophisticated production machine but is also a tool for educating all aspects of HPC and a learning machine for young researchers to prepare themselves for national and international HPC.

Future plans

ALICE will be expanded over the coming years. Apart from our own expansion plans, we are always open to collaborate with other groups/institutes on expanding and improving ALICE.

The expansion that are currently being discussed include:

  • Expansion of temporary storage (first half 2021)
    • We are in the process of getting a 250TB parellel storage system that will run BeeGFS.
  • Upgrading login nodes with GPUs (Q2 2021)
    • We are planning to add one NVIDIA Tesla T4 to each login node.
  • Additional GPU nodes (estimated second half of 2021)
  • Additional CPU nodes (estimated Q3 2022)


Off to research computing Wonderland

ALICE (Academic Leiden Interdisciplinary Cluster Environment) is the high-performance computing (HPC) facility of the partnership between Leiden University and Leiden University Medical Center (LUMC). It is available to any researcher from both partners. Leiden University and LUMC aim to help deliver cutting edge research using innovative technology within the broad area of data-centric HPC. Both partners are responsible for the hosting, system support, scientific support and service delivery of several large super-computing and research data storage resources for the Leiden research community.

This wiki is the main source of documentation about the ALICE cluster.

Costs overview

Currently, access to the ALICE cluster and related services is provided free of charge to all researchers and students at LU and LUMC.

  • System description

Hardware description (ED)

Overview of the cluster

Conceptual View of ALICE

The ALICE cluster is a hybrid cluster consisting of

  • 2 login nodes (4 TFlops)
  • 20 CPU nodes (40 TFlops)
  • 10 GPU nodes (40 GPU, 20 TFlops CPU + 536 TFlops GPU)
  • 1 High Memory CPU node (4 TFlops)
  • Storage Device (31 * 15 + 70 = 535 TB)

In summary: 604 TFlops, 816 cores (1632 threads), 14.4 TB RAM.

ALICE has a second high memory. This node is not included above as it is only available to the research group which purchased the node.

You can find a more comprehensive description of the individual components of ALICE in the section Hardware Description. Also see a photo gallery of the hardware.

ALICE is a pre-configuration system for the university to gain experience with managing, supporting and operating a university-wide HPC system. Once the system and governance have proven to be a functional research asset, it will be extended and continued for the coming years.

The descriptions are for the configuration which is housed partly in the data centre at LMUY and the data centre at Leiden University Medical Center (LUMC).

Login nodes

The cluster has two login nodes, also called head nodes. These are the nodes to which the users of ALICE can log in. These login nodes can be used to develop your HPC code and test/debug the programs. From the login nodes, you initiate the calls to the Slurm queuing system, spawning your compute jobs. The login nodes are also used to transfer data between the ALICE storage device and the university research storage data stores.

The login nodes have the following configuration:

1 Huawei FusionServer 2288H V5
2 Xeon Gold 6126 2.6GHz 12 core hyperthreading disabled
1 384 GB RAM
2 240 GB SSD RAID 1 (OS disk)
3 8 TB SATA RAID 5 (data disk)
1 Mellanox ConnectX-5 (EDR) Infiniband

CPU nodes

The 20 CPU (node001 - node020) based compute nodes have the following configuration:

1 Huawei FusionServer X6000 V5
2 Xeon Gold 6126 2.6GHz 12 core hyperthreading disabled
1 384 GB RAM
2 240 GB SSD RAID 1 (OS disk)
3 8 TB SATA RAID 5 (data disk)
1 Mellanox ConnectX-5 (EDR) Infiniband

Total: 480 cores @ 2.6GHz = 1248 coreGHz

GPU nodes

The 10 GPU (node851 - node860) based compute nodes have the following configuration:

1 Huawei FusionServer G5500 / G560 V5
2 Xeon Gold 6126 2.6GHz 12 core hyperthreading disabled
1 384 GB RAM
4x PNY GeForce RTX 2080TI GPU
2x 240 GB SSD RAID 1 (OS disk)
3x 8 TB SATA RAID 5 (data disk)

Total: 240 cores @ 2.6GHz = 624 coreGHz

High Memory Node

The High Memory compute node (node801) has the following configuration:

1 Dell PowerEdge R840
4 Xeon Gold 6128 3.4GHz 6 core hyperthreading disabled
1 2048 GB RAM
2 240 GB SSD RAID 1 (OS disk)
13 2 TB SATA RAID 5 (data disk)

Total: 24 core @ 3.4GHz = 82 coreGHz

New February 2021: A second High Memory compute node (node802) has the following configuration:

1 Supermicro SERVERline Individual
2 AMD EPYC 7662 2.0GHz 64 cores hyperthreading enabled
1 4096 GB RAM
2 256 GB SSD RAID 1 (OS disk)
2 10 TB SATA RAID 0(data disk)

Total: 128 core @ 2.0GHz = 256 coreGHz

Node802 was purchased by a research group from the Mathematical Institute. It is only available to the reserach group

Network configuration

Below each network segment is described in some detail.

Campus Network

The campus network provides the connectivity to access the ALICE cluster from outside. The part of the campus network that enters the ALICE cluster is shielded from the outside world and is disclosed by an ssh gateway. This part of the network provides user access to the login nodes. See section Login to cluster for a detailed description on how to access the login nodes from your desktop.

Command Network

This network is used by the job queuing system Slurm or interactive jobs to transfer command like information between the login nodes and the compute nodes.

Data network

This network is only for data transfer to and from the storage device. All data belonging to the shares /home, /software and /data is transported over this network, therefore relieving the other networks off traffic. In fact, the mounts of these shares are automatically attached to the data network. As a user, you do not have to care about the fact that data transfer might interfere with the job queuing or inter-process communication.

Infiniband Network

This fast (100 Gbps) network is available for extremely fast and very low latency inter nodal communication between threads of your parallel jobs. In fact, MPI automatically selects this network for inter-nodal communication. You need not bother about this.


Data Storage Device

The current configuration of ALICE is in a pre-configuration phase. For the moment, fast data storage is based on a simple NFS server. A full-blown distributed file system will be put in place the second half of 2020.

1 PowerEdge R7425
2 AMD Epyc 7261
1 128 GB RAM
2 240 GB SSD RAID 1 (OS disk)
10 8 TB SATA RAID 5 (data disk)


Before we start using individual software packages, we need to understand why multiple versions of the software are available on ALICE systems and why users need to have a way to control which version they are using. The three biggest factors are:

  • software incompatibilities;
  • versioning;
  • dependencies.

Software incompatibility is a major headache for programmers. Sometimes the presence (or absence) of a software package will break others that depend on it. Two of the most famous examples are Python 2 and 3 and C compiler versions. Python 3 famously provides a python command that conflicts with that provided by Python 2. Software compiled against a newer version of the C libraries and then used when they are not present will result in a nasty 'GLIBCXX_3.4.20' not found error, for instance.

Software versioning is another common issue. A team might depend on a certain package version for their research project - if the software version was to change (for instance, if a package was updated), it might affect their results. Having access to multiple software versions allow a set of researchers to prevent software versioning issues from affecting their results.

Dependencies are where a particular software package (or even a particular version) depends on having access to another software package (or even a particular version of another software package). For example, the VASP materials science software may depend on having a particular version of the FFTW (Fastest Fourier Transform in the West) software library available for it to work. Environment modules are the solution to these problems.

Environment modules

A module is a self-contained description of a software package - it contains the settings required to run a software package and, usually, encodes required dependencies on other software packages.

There are a number of different environment module implementations commonly used on HPC systems: the two most common are TCL modules and Lmod. Both of these use similar syntax and the concepts are the same so learning to use one will allow you to use whichever is installed on the system you are using. In both implementations the module command is used to interact with environment modules. An additional sub-command is usually added to the command to specify what you want to do. For a list of sub-commands you can use module -h or module help. As for all commands, you can access the full help on the man pages with man module.

List of currently loaded modules

On login, you usually start out with a default set of modules.

The module list command shows which modules you currently have loaded in your environment. This is what you will most likely see after logging into ALICE.

  [me@nodelogin02~]$ module list
  Currently Loaded Modules:
    1) shared   2) DefaultModules   3) gcc/8.2.0   4) slurm/19.05.1

You can see that by default the module for Slurm and the gcc compiler is loaded. The number behind the slash sign represent the version number.

List available modules

To see the available modules, use module -d avail. With the -d option, you will only get the default versions of the modules. For various software packages, there are also older/other versions available, that might be used if necessary. You can see all version by omitting the -d option.

 [me@nodelogin02~]$ module -d avail
  ------------------------------------------------ /cm/shared/easybuild/modules/all -------------------------------------------------
  AMUSE-Miniconda2/4.7.10                                         VTK/8.2.0-foss-2019b-Python-3.7.4
  AMUSE-VADER/12.0.0-foss-2018a-Python-2.7.14                     WebKitGTK+/2.24.1-GCC-8.2.0-2.31.1
  AMUSE/12.0.0-foss-2018a-Python-2.7.14                           X11/20190717-GCCcore-8.3.0
  ATK/2.32.0-GCCcore-8.2.0                                        XML-Parser/2.44_01-GCCcore-7.3.0-Perl-5.28.0
  Autoconf/2.69-GCCcore-8.3.0                                     XZ/5.2.4-GCCcore-8.3.0
  Automake/1.16.1-GCCcore-8.3.0                                   Yasm/1.3.0-GCCcore-8.3.0
  Autotools/20180311-GCCcore-8.3.0                                ZeroMQ/4.3.2-GCCcore-8.2.0
  Bazel/0.20.0-GCCcore-8.2.0                                      amuse-framework/12.0.0-foss-2018a-Python-2.7.14
  Bison/3.3.2                                                     at-spi2-atk/2.32.0-GCCcore-8.2.0
  Boost.Python/1.67.0-foss-2018b-Python-3.6.6                     at-spi2-core/2.32.0-GCCcore-8.2.0
  Boost/1.71.0-gompi-2019b                                        binutils/2.32
  etc etc etc .......

If you are searching for a specific software package or tool you can search for the full module name like this:

 [me@nodelogin02~]$ module avail python
  ---------------------------------------- /cm/shared/easybuild/modules/all -----------------------------------------
  AMUSE/13.1.0-foss-2018a-Python-3.6.4                    (D)
  Biopython/1.75-foss-2019b-Python-3.7.4                  (D)
  Cython/0.29.3-foss-2019a-Python-3.7.2                   (D)
  Docutils/0.9.1-foss-2018a-Python-3.6.4                  (D)
  IPython/7.7.0-foss-2019a-Python-3.7.2                   (D)
  Meson/0.51.2-GCCcore-8.3.0-Python-3.7.4                 (D)
  NLTK/3.2.4-foss-2019a-Python-3.7.2                      (D)
  Python/3.7.4-GCCcore-8.3.0                              (D)
  etc etc etc .......

In the above output, you can see that there are modules with the flag "(D)". This indicates that this is the default module for a software package for which modules for different versions exists.

If you want to get more information about a specific module, you can use the whatis sub-command:

  [me@nodelogin02~]$ module whatis Python/3.7.4-GCCcore-8.3.0
  Python/3.7.4-GCCcore-8.3.0                            : Description: Python is a programming language that lets you work more quickly and 
  integrate your systems more effectively.
  Python/3.7.4-GCCcore-8.3.0                            : Homepage:
  Python/3.7.4-GCCcore-8.3.0                            : URL:
  Python/3.7.4-GCCcore-8.3.0                            : Extensions: alabaster-0.7.12, asn1crypto-0.24.0, atomicwrites-1.3.0, attrs-19.1.0, Babel- 
  2.7.0, bcrypt-3.1.7, bitstring-3.1.6, blist-1.3.6, certifi-2019.9.11, cffi-1.12.3, chardet-3.0.4, Click-7.0, cryptography-2.7, Cython-0.29.13, 
  deap-1.3.0, decorator-4.4.0, docopt-0.6.2, docutils-0.15.2, ecdsa-0.13.2, future-0.17.1, idna-2.8, imagesize-1.1.0, importlib_metadata-0.22, 
  ipaddress-1.0.22, Jinja2-2.10.1, joblib-0.13.2, liac-arff-2.4.0, MarkupSafe-1.1.1, mock-3.0.5, more-itertools-7.2.0, netaddr-0.7.19, netifaces- 
  0.10.9, nose-1.3.7, packaging-19.1, paramiko-2.6.0, pathlib2-2.3.4, paycheck-1.0.2, pbr-5.4.3, pip-19.2.3, pluggy-0.13.0, psutil-5.6.3, py-1.8.0, 
  py_expression_eval-0.3.9, pyasn1-0.4.7, pycparser-2.19, pycrypto-2.6.1, Pygments-2.4.2, PyNaCl-1.3.0, pyparsing-2.4.2, pytest-5.1.2, python- 
  dateutil-2.8.0, pytz-2019.2, requests-2.22.0, scandir-1.10.0, setuptools-41.2.0, setuptools_scm-3.3.3, six-1.12.0, snowballstemmer-1.9.1, Sphinx- 
  2.2.0, sphinxcontrib-applehelp-1.0.1, sphinxcontrib-devhelp-1.0.1, sphinxcontrib-htmlhelp-1.0.2, sphinxcontrib-jsmath-1.0.1, sphinxcontrib-qthelp- 
  1.0.2, sphinxcontrib-serializinghtml-1.1.3, sphinxcontrib-websupport-1.1.2, tabulate-0.8.3, ujson-1.35, urllib3-1.25.3, virtualenv-16.7.5, 
  wcwidth-0.1.7, wheel-0.33.6, xlrd-1.2.0, zipp-0.6.0

Load modules

To load a software module, use module load. In the example below, we will use Python 3.

Initially, Python 3 is not loaded and therefore not available for use. We can test this by using the command which that looks for programs the same way that Bash does. We can use it to tell us where a particular piece of software is stored.

  [me@nodelogin01~]$ which python3
  /usr/bin/which: no python3 in     (/cm/shared/apps/slurm/18.08.4/sbin:/cm/shared/apps/slurm/18.08.4/bin:/cm/local/apps/gcc/8.2.0/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/me/.local/bin:/home/me/bin)

We can load the python3 command with module load:

  [me@nodelogin01 ~]$ module load Python/3.7.2-GCCcore-8.2.0
  [me@nodelogin01 ~]$ which python3

So what just happened? To understand the output, first we need to understand the nature of the $PATH environment variable. $PATH is a special environment variable that controls where a Linux operating system (OS) looks for software. Specifically $PATH is a list of directories (separated by :) that the OS searches through for a command. As with all environment variables, we can print it using echo.

  [me@nodelogin01 ~]$ echo $PATH
  /cm/shared/easybuild/software/Python/3.7.2-GCCcore-8.2.0/bin:/cm/shared/easybuild/software/XZ/5.2.4-GCCcore-  8.2.0/bin:/cm/shared/easybuild/software/SQLite/3.27.2-GCCcore-8.2.0/bin:/cm/shared/easybuild/software/Tcl/8.6.9-GCCcore-8.2.0/bin:/cm/shared/easybuild/software/libreadline/8.0-GCCcore-8.2.0/bin:/cm/shared/easybuild/software/ncurses/6.1-GCCcore-8.2.0/bin:/cm/shared/easybuild/software/bzip2/1.0.6- GCCcore-8.2.0/bin:/cm/shared/easybuild/software/GCCcore/8.2.0/bin:/cm/shared/apps/slurm/19.05.1/sbin:/cm/shared/apps/slurm/19.05.1/bin:/cm/local/apps/gcc/8.2.0/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/me/.local/bin:/home/me/bin

You will notice a similarity to the output of the which command. In this case, there’s only one difference: the /cm/shared/easybuild/software/Python/3.7.2-GCCcore-8.2.0/bin directory at the beginning.

When we used module load Python/3.7.2-GCCcore-8.2.0, it added this directory to the beginning of our $PATH. Let us examine what is there:

   [me@nodelogin01 ~]$ ls /cm/shared/easybuild/software/Python/3.7.2-GCCcore-8.2.0/bin
   2to3		  futurize	 pip	      pytest		 python3-config	sphinx-apidoc
   2to3-3.7	  idle3		 pip3	      py.test		 pyvenv		sphinx-autogen
   chardetect	  idle3.7	 pip3.7       python		 pyvenv-3.7	sphinx-build
   cygdb	  netaddr	 pybabel      python3		sphinx-quickstart
   cython	  nosetests	 __pycache__  python3.7		tabulate
   cythonize	  nosetests-3.7  pydoc3       python3.7-config		virtualenv
   easy_install	  pasteurize	 pydoc3.7     python3.7m		wheel
   easy_install-3.7  pbr	 pygmentize   python3.7m-config

Taking this to its conclusion, module load adds software to your $PATH. It “loads” software.

A special note on this, module load will also load required software dependencies. If you compare the output below with what you had when you first logged in to ALICE, you will notice several other modules have been load automatically, because the Python module depends on them.

   [me@nodelogin01 ~]$ module list
   Currently Loaded Modules:
     1) shared           5) GCCcore/8.2.0               9) libreadline/8.0-GCCcore-8.2.0  13) GMP/6.1.2-GCCcore-8.2.0
     2) DefaultModules   6) bzip2/1.0.6-GCCcore-8.2.0  10) Tcl/8.6.9-GCCcore-8.2.0        14) libffi/3.2.1-GCCcore-8.2.0
     3) gcc/8.2.0        7) zlib/1.2.11-GCCcore-8.2.0  11) SQLite/3.27.2-GCCcore-8.2.0    15) Python/3.7.2-GCCcore-8.2.0
     4) slurm/19.05.1    8) ncurses/6.1-GCCcore-8.2.0  12) XZ/5.2.4-GCCcore-8.2.0

Also a note of warning: When you load several modules, it is possible that their dependencies can cause conflicts and problems later on. It is best to always check what other modules have been automatically loaded.

Unload modules

The command module unload “un-loads” a module. For the above example:

 [me@nodelogin01 ~]$ module unload Python/3.7.2-GCCcore-8.2.0
 [me@nodelogin01 ~]$ module list
 Currently Loaded Modules:
   1) shared           5) GCCcore/8.2.0               9) libreadline/8.0-GCCcore-8.2.0  13) GMP/6.1.2-GCCcore-8.2.0
   2) DefaultModules   6) bzip2/1.0.6-GCCcore-8.2.0  10) Tcl/8.6.9-GCCcore-8.2.0        14) libffi/3.2.1-GCCcore-8.2.0
   3) gcc/8.2.0        7) zlib/1.2.11-GCCcore-8.2.0  11) SQLite/3.27.2-GCCcore-8.2.0
   4) slurm/18.08.4    8) ncurses/6.1-GCCcore-8.2.0  12) XZ/5.2.4-GCCcore-8.2.0

Important: Currently, unloading a module does not unload its dependencies (as you can see from the above output).

If you want to remove all the modules that are currently loaded, you can use the command module purge:

 [me@nodelogin01 ~]$ module purge
 [me@nodelogin01 ~]$ module list
 No modules loaded

Note that this command will also unload the modules loaded by default on login including Slurm. You can either manually load the modules back or source your bashrc with source ~/.bashrc. File storage