Difference between revisions of "About ALICE"
From ALICE Documentation
|Line 5:||Line 5:|
Revision as of 14:55, 6 April 2020
This wiki is the main documentation source for information about the Academic Leiden Interdisciplinary Cluster Environment (ALICE) cluster. ALICE is the research computing facility of the partnership Leiden University and Leiden University Medical Center. It is available to any researcher from both partners. In this wiki, we will introduce the ALICE cluster in detail. We assume you know your way around in a terminal session and that you will responsibly use the cluster and respect the available resources. Remember that a cluster runs many different processes submitted by many different users. If you are not sure what you are doing, ask your administrator before submitting very intensive jobs.
- 1 Why ALICE
- 2 Future plans
- 3 Costs overview
- 4 Publications
- 5 Hardware description (ED)
- 6 Software
- 7 File and I/O Management
- 8 Summary of available file systems
- 9 The home file system
- 10 The scratch-shared file system on /data
- 11 The local scratch file system on /scratchdata
- 12 Software file system
- 13 Compute Local
- 14 Best Practices - Shared File System
High-Performance Computing (HPC) previously the domain of theoretical scientists and computer and software developers is becoming ever more important as a research tool in many research areas. An HPC facility, providing serious computational capabilities, combined with easy and flexible local access, is a strong advantage for these research areas. ALICE is the HPC facility that answers those needs for Leiden University (LU) and Leiden University Medical Center (LUMC). It is available to all researchers and students from both LU and LUMC.
The ALICE facility currently implemented is a first phase edition of what will be a larger Hybrid HPC facility for research, exceeding the capabilities of what individual institutes can build and will provide a stepping stone to the larger national facilities. Although the current implementation is located at two data centres (LU and LUMC) it is one.
The facility aims to be an easily accessible, easily usable system with extensive local support at all levels of expertise. Given the expected diverse use, diversity is implemented in all aspects of computing, namely: the number of CPU's, GPU's and the ratio of these two numbers; the size of the core memory to the CPU's; the data storage size and location; and the speed of the network.
ALICE provides not only a sophisticated production machine but is also a tool for educating all aspects of HPC and a learning machine for young researchers to prepare themselves for national and international HPC.
ALICE (Academic Leiden Interdisciplinary Cluster Environment) is the high-performance computing (HPC) facility of the partnership between Leiden University and Leiden University Medical Center (LUMC). It is available to any researcher from both partners. Leiden University and LUMC aim to help deliver cutting edge research using innovative technology within the broad area of data-centric HPC. Both partners are responsible for the hosting, system support, scientific support and service delivery of several large super-computing and research data storage resources for the Leiden research community.
This wiki is the main source of documentation about the ALICE cluster.
ALICE will be expanded over the coming years. Apart from our own expansion plans, we are always open to collaborate with other groups/institutes on expanding and improving ALICE.
The expansion that are currently being discussed include:
- Expansion of temporary storage (first half 2021)
- We are in the process of getting a 250TB parellel storage system that will run BeeGFS.
- Upgrading login nodes with GPUs (Q2 2021)
- We are planning to add one NVIDIA Tesla T4 to each login node.
- Additional GPU nodes (estimated second half of 2021)
- Additional CPU nodes (estimated Q3 2022)
Currently, access to the ALICE cluster and related services is provided free of charge to all researchers and students at LU and LUMC.
Articles with acknowledgements to the use of ALICE
- Effects of stellar density on the photoevaporation of circumstellar discs, Concha-Ramirez, F. et al., MNRAS, 501, 1782 (February 2021), DOI: https://doi.org/10.1093/mnras/staa3669
- Lucky planets: how circum-binary planets survive the supernova in one of the inner-binary components, Fagginger Auer, F. & Portegies Zwart, S., eprint arXiv:2101.08033, Submitted to SciPost Astronomy (January 2021), https://ui.adsabs.harvard.edu/link_gateway/2021arXiv210108033F/arxiv:2101.08033
- Trimodal structure of Hercules stream explained by originating from bar resonances, Asano, T. et al., MNRAS, 499, 2416 (December 2020), DOI: https://doi.org/10.1093/mnras/staa2849
- Oort cloud Ecology II: Extra-solar Oort clouds and the origin of asteroidal interlopers, Portegies Zwart, S., eprint arXiv:2011.08257, accepted for publication by A&A, (November 2020), https://ui.adsabs.harvard.edu/link_gateway/2020arXiv201108257P/arxiv:2011.08257
- The ecological impact of high-performance computing in astrophysics. Portegies Zwart, S., Nature Astronomy, 4, 819–822 (September 2020), DOI: https://doi.org/10.1038/s41550-020-1208-y.
- Better Distractions: Transformer-based Distractor Generationand Multiple Choice Question Filtering, Offerijns, J., Verberne, V., Verhoe, T., eprint arXiv:2010.09598, (October 2020), https://arxiv.org/abs/2010.09598
Leiden researchers and their use of HPC
- Identifying Earth-impacting asteroids using an artificial neural network. John D. Hefele, Francesco Bortolussi and Simon Portegies Zwart. Astronomy & Astrophysics, February 2020.
News articles featuring ALICE
- Hazardous Object Identifier: Supercomputer Helps to Identify Dangerous Asteroids, Oliver Peckman, HPC Wire, 04 March 2020, link
- Elf reuzestenen op ramkoers met de aarde?, Annelies Bes, 13 February 2020, Kijk Magazine, link
- Leidse sterrenkundigen ontdekken aardscheerders-in-spé, NOVA, 12 February 2020, link
Hardware description (ED)
Overview of the cluster
The ALICE cluster is a hybrid cluster consisting of
- 2 login nodes (4 TFlops)
- 20 CPU nodes (40 TFlops)
- 10 GPU nodes (40 GPU, 20 TFlops CPU + 536 TFlops GPU)
- 1 High Memory CPU node (4 TFlops)
- Storage Device (31 * 15 + 70 = 535 TB)
In summary: 604 TFlops, 816 cores (1632 threads), 14.4 TB RAM.
ALICE has a second high memory. This node is not included above as it is only available to the research group which purchased the node.
ALICE is a pre-configuration system for the university to gain experience with managing, supporting and operating a university-wide HPC system. Once the system and governance have proven to be a functional research asset, it will be extended and continued for the coming years.
The descriptions are for the configuration which is housed partly in the data centre at LMUY and the data centre at Leiden University Medical Center (LUMC).
The cluster has two login nodes, also called head nodes. These are the nodes to which the users of ALICE can log in. These login nodes can be used to develop your HPC code and test/debug the programs. From the login nodes, you initiate the calls to the Slurm queuing system, spawning your compute jobs. The login nodes are also used to transfer data between the ALICE storage device and the university research storage data stores.
The login nodes have the following configuration:
|1||Huawei FusionServer 2288H V5|
|2||Xeon Gold 6126 2.6GHz 12 core||hyperthreading disabled|
|2||240 GB SSD||RAID 1 (OS disk)|
|3||8 TB SATA||RAID 5 (data disk)|
|1||Mellanox ConnectX-5 (EDR)||Infiniband|
The 20 CPU (node001 - node020) based compute nodes have the following configuration:
|1||Huawei FusionServer X6000 V5|
|2||Xeon Gold 6126 2.6GHz 12 core||hyperthreading disabled|
|2||240 GB SSD||RAID 1 (OS disk)|
|3||8 TB SATA||RAID 5 (data disk)|
|1||Mellanox ConnectX-5 (EDR)||Infiniband|
Total: 480 cores @ 2.6GHz = 1248 coreGHz
The 10 GPU (node851 - node860) based compute nodes have the following configuration:
|1||Huawei FusionServer G5500 / G560 V5|
|2||Xeon Gold 6126 2.6GHz 12 core||hyperthreading disabled|
|4x||PNY GeForce RTX 2080TI||GPU|
|2x||240 GB SSD||RAID 1 (OS disk)|
|3x||8 TB SATA||RAID 5 (data disk)|
Total: 240 cores @ 2.6GHz = 624 coreGHz
High Memory Node
The High Memory compute node (node801) has the following configuration:
|1||Dell PowerEdge R840|
|4||Xeon Gold 6128 3.4GHz 6 core||hyperthreading disabled|
|2||240 GB SSD||RAID 1 (OS disk)|
|13||2 TB SATA||RAID 5 (data disk)|
Total: 24 core @ 3.4GHz = 82 coreGHz
New February 2021: A second High Memory compute node (node802) has the following configuration:
|1||Supermicro SERVERline Individual|
|2||AMD EPYC 7662 2.0GHz 64 cores||hyperthreading enabled|
|2||256 GB SSD||RAID 1 (OS disk)|
|2||10 TB SATA||RAID 0(data disk)|
Total: 128 core @ 2.0GHz = 256 coreGHz
Node802 was purchased by a research group from the Mathematical Institute. It is only available to the reserach group
Below each network segment is described in some detail.
The campus network provides the connectivity to access the ALICE cluster from outside. The part of the campus network that enters the ALICE cluster is shielded from the outside world and is disclosed by an ssh gateway. This part of the network provides user access to the login nodes. See section Login to cluster for a detailed description on how to access the login nodes from your desktop.
This network is used by the job queuing system Slurm or interactive jobs to transfer command like information between the login nodes and the compute nodes.
This network is only for data transfer to and from the storage device. All data belonging to the shares /home, /software and /data is transported over this network, therefore relieving the other networks off traffic. In fact, the mounts of these shares are automatically attached to the data network. As a user, you do not have to care about the fact that data transfer might interfere with the job queuing or inter-process communication.
This fast (100 Gbps) network is available for extremely fast and very low latency inter nodal communication between threads of your parallel jobs. In fact, MPI automatically selects this network for inter-nodal communication. You need not bother about this.
Data Storage Device
The current configuration of ALICE is in a pre-configuration phase. For the moment, fast data storage is based on a simple NFS server. A full-blown distributed file system will be put in place the second half of 2020.
|2||AMD Epyc 7261|
|2||240 GB SSD||RAID 1 (OS disk)|
|10||8 TB SATA||RAID 5 (data disk)|
Before we start using individual software packages, we need to understand why multiple versions of the software are available on ALICE systems and why users need to have a way to control which version they are using. The three biggest factors are:
- software incompatibilities;
Software incompatibility is a major headache for programmers. Sometimes the presence (or absence) of a software package will break others that depend on it. Two of the most famous examples are Python 2 and 3 and C compiler versions. Python 3 famously provides a python command that conflicts with that provided by Python 2. Software compiled against a newer version of the C libraries and then used when they are not present will result in a nasty 'GLIBCXX_3.4.20' not found error, for instance.
Software versioning is another common issue. A team might depend on a certain package version for their research project - if the software version was to change (for instance, if a package was updated), it might affect their results. Having access to multiple software versions allow a set of researchers to prevent software versioning issues from affecting their results.
Dependencies are where a particular software package (or even a particular version) depends on having access to another software package (or even a particular version of another software package). For example, the VASP materials science software may depend on having a particular version of the FFTW (Fastest Fourier Transform in the West) software library available for it to work. Environment modules are the solution to these problems.
A module is a self-contained description of a software package - it contains the settings required to run a software package and, usually, encodes required dependencies on other software packages.
There are a number of different environment module implementations commonly used on HPC systems: the two most common are TCL modules and Lmod. Both of these use similar syntax and the concepts are the same so learning to use one will allow you to use whichever is installed on the system you are using. In both implementations the
module command is used to interact with environment modules. An additional sub-command is usually added to the command to specify what you want to do. For a list of sub-commands you can use
module -h or
module help. As for all commands, you can access the full help on the man pages with
List of currently loaded modules
On login, you usually start out with a default set of modules.
module list command shows which modules you currently have loaded in your environment. This is what you will most likely see after logging into ALICE.
[me@nodelogin02~]$ module list Currently Loaded Modules: 1) shared 2) DefaultModules 3) gcc/8.2.0 4) slurm/19.05.1
You can see that by default the module for Slurm and the gcc compiler is loaded. The number behind the slash sign represent the version number.
List available modules
To see the available modules, use
module -d avail. With the
-d option, you will only get the default versions of the modules. For various software packages, there are also older/other versions available, that might be used if necessary. You can see all version by omitting the
[me@nodelogin02~]$ module -d avail ------------------------------------------------ /cm/shared/easybuild/modules/all ------------------------------------------------- AMUSE-Miniconda2/4.7.10 VTK/8.2.0-foss-2019b-Python-3.7.4 AMUSE-VADER/12.0.0-foss-2018a-Python-2.7.14 WebKitGTK+/2.24.1-GCC-8.2.0-2.31.1 AMUSE/12.0.0-foss-2018a-Python-2.7.14 X11/20190717-GCCcore-8.3.0 ATK/2.32.0-GCCcore-8.2.0 XML-Parser/2.44_01-GCCcore-7.3.0-Perl-5.28.0 Autoconf/2.69-GCCcore-8.3.0 XZ/5.2.4-GCCcore-8.3.0 Automake/1.16.1-GCCcore-8.3.0 Yasm/1.3.0-GCCcore-8.3.0 Autotools/20180311-GCCcore-8.3.0 ZeroMQ/4.3.2-GCCcore-8.2.0 Bazel/0.20.0-GCCcore-8.2.0 amuse-framework/12.0.0-foss-2018a-Python-2.7.14 Bison/3.3.2 at-spi2-atk/2.32.0-GCCcore-8.2.0 Boost.Python/1.67.0-foss-2018b-Python-3.6.6 at-spi2-core/2.32.0-GCCcore-8.2.0 Boost/1.71.0-gompi-2019b binutils/2.32 etc etc etc .......
If you are searching for a specific software package or tool you can search for the full module name like this:
[me@nodelogin02~]$ module avail python ---------------------------------------- /cm/shared/easybuild/modules/all ----------------------------------------- AMUSE-GPU/12.0.0-foss-2018a-Python-2.7.14 AMUSE-VADER/12.0.0-foss-2018a-Python-2.7.14 AMUSE/12.0.0-foss-2018a-Python-2.7.14 AMUSE/13.1.0-foss-2018a-Python-3.6.4 (D) Biopython/1.71-foss-2018a-Python-2.7.14 Biopython/1.73-foss-2019a Biopython/1.75-foss-2019b-Python-3.7.4 (D) Boost.Python/1.67.0-foss-2018b-Python-3.6.6 CGAL/4.11.1-foss-2018b-Python-3.6.6 CONCOCT/1.1.0-foss-2019a-Python-2.7.15 CheckM/1.0.18-foss-2019a-Python-2.7.15 Cython/0.25.2-foss-2018a-Python-2.7.14 Cython/0.25.2-foss-2018a-Python-3.6.4 Cython/0.29.3-foss-2019a-Python-3.7.2 (D) DAS_Tool/1.1.1-foss-2019a-R-3.6.0-Python-3.7.2 Docutils/0.9.1-foss-2018a-Python-2.7.14 Docutils/0.9.1-foss-2018a-Python-3.6.4 (D) GObject-Introspection/1.60.1-GCCcore-8.2.0-Python-3.7.2 IPython/5.7.0-foss-2018a-Python-2.7.14 IPython/6.4.0-foss-2018a-Python-3.6.4 IPython/7.7.0-foss-2019a-Python-3.7.2 (D) Keras/2.2.4-foss-2019a-Python-3.7.2 Mako/1.0.7-foss-2017b-Python-2.7.14 Mako/1.0.7-foss-2018a-Python-2.7.14 Mako/1.0.7-foss-2018b-Python-2.7.15 Meson/0.50.0-GCCcore-8.2.0-Python-3.7.2 Meson/0.51.2-GCCcore-8.3.0-Python-3.7.4 (D) MultiQC/1.7-foss-2018b-Python-3.6.6 NLTK/3.2.4-foss-2018a-Python-3.6.4 NLTK/3.2.4-foss-2019a-Python-3.7.2 (D) PyCairo/1.18.0-foss-2018b-Python-3.6.6 PyTorch/1.3.1-fosscuda-2019b-Python-3.7.4 PyYAML/3.13-foss-2018b-Python-3.6.6 Python/2.7.14-foss-2017b Python/2.7.14-foss-2018a Python/2.7.14-GCCcore-6.4.0-bare Python/2.7.15-foss-2018b Python/2.7.15-GCCcore-7.3.0-bare Python/2.7.15-GCCcore-8.2.0 Python/2.7.16-GCCcore-8.3.0 Python/3.6.4-foss-2018a Python/3.6.6-foss-2018b Python/3.7.2-GCCcore-8.2.0 Python/3.7.4-GCCcore-8.3.0 (D) etc etc etc .......
In the above output, you can see that there are modules with the flag "(D)". This indicates that this is the default module for a software package for which modules for different versions exists.
If you want to get more information about a specific module, you can use the
[me@nodelogin02~]$ module whatis Python/3.7.4-GCCcore-8.3.0 Python/3.7.4-GCCcore-8.3.0 : Description: Python is a programming language that lets you work more quickly and integrate your systems more effectively. Python/3.7.4-GCCcore-8.3.0 : Homepage: https://python.org/ Python/3.7.4-GCCcore-8.3.0 : URL: https://python.org/ Python/3.7.4-GCCcore-8.3.0 : Extensions: alabaster-0.7.12, asn1crypto-0.24.0, atomicwrites-1.3.0, attrs-19.1.0, Babel- 2.7.0, bcrypt-3.1.7, bitstring-3.1.6, blist-1.3.6, certifi-2019.9.11, cffi-1.12.3, chardet-3.0.4, Click-7.0, cryptography-2.7, Cython-0.29.13, deap-1.3.0, decorator-4.4.0, docopt-0.6.2, docutils-0.15.2, ecdsa-0.13.2, future-0.17.1, idna-2.8, imagesize-1.1.0, importlib_metadata-0.22, ipaddress-1.0.22, Jinja2-2.10.1, joblib-0.13.2, liac-arff-2.4.0, MarkupSafe-1.1.1, mock-3.0.5, more-itertools-7.2.0, netaddr-0.7.19, netifaces- 0.10.9, nose-1.3.7, packaging-19.1, paramiko-2.6.0, pathlib2-2.3.4, paycheck-1.0.2, pbr-5.4.3, pip-19.2.3, pluggy-0.13.0, psutil-5.6.3, py-1.8.0, py_expression_eval-0.3.9, pyasn1-0.4.7, pycparser-2.19, pycrypto-2.6.1, Pygments-2.4.2, PyNaCl-1.3.0, pyparsing-2.4.2, pytest-5.1.2, python- dateutil-2.8.0, pytz-2019.2, requests-2.22.0, scandir-1.10.0, setuptools-41.2.0, setuptools_scm-3.3.3, six-1.12.0, snowballstemmer-1.9.1, Sphinx- 2.2.0, sphinxcontrib-applehelp-1.0.1, sphinxcontrib-devhelp-1.0.1, sphinxcontrib-htmlhelp-1.0.2, sphinxcontrib-jsmath-1.0.1, sphinxcontrib-qthelp- 1.0.2, sphinxcontrib-serializinghtml-1.1.3, sphinxcontrib-websupport-1.1.2, tabulate-0.8.3, ujson-1.35, urllib3-1.25.3, virtualenv-16.7.5, wcwidth-0.1.7, wheel-0.33.6, xlrd-1.2.0, zipp-0.6.0
To load a software module, use
module load. In the example below, we will use Python 3.
Initially, Python 3 is not loaded and therefore not available for use. We can test this by using the command
which that looks for programs the same way that Bash does. We can use it to tell us where a particular piece of software is stored.
[me@nodelogin01~]$ which python3 /usr/bin/which: no python3 in (/cm/shared/apps/slurm/18.08.4/sbin:/cm/shared/apps/slurm/18.08.4/bin:/cm/local/apps/gcc/8.2.0/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/me/.local/bin:/home/me/bin)
We can load the
python3 command with
[me@nodelogin01 ~]$ module load Python/3.7.2-GCCcore-8.2.0 [me@nodelogin01 ~]$ which python3 /cm/shared/easybuild/software/Python/3.7.4-GCCcore-8.3.0/bin/python3
So what just happened? To understand the output, first we need to understand the nature of the
$PATH environment variable.
$PATH is a special environment variable that controls where a Linux operating system (OS) looks for software. Specifically
$PATH is a list of directories (separated by
:) that the OS searches through for a command. As with all environment variables, we can print it using
[me@nodelogin01 ~]$ echo $PATH /cm/shared/easybuild/software/Python/3.7.2-GCCcore-8.2.0/bin:/cm/shared/easybuild/software/XZ/5.2.4-GCCcore- 8.2.0/bin:/cm/shared/easybuild/software/SQLite/3.27.2-GCCcore-8.2.0/bin:/cm/shared/easybuild/software/Tcl/8.6.9-GCCcore-8.2.0/bin:/cm/shared/easybuild/software/libreadline/8.0-GCCcore-8.2.0/bin:/cm/shared/easybuild/software/ncurses/6.1-GCCcore-8.2.0/bin:/cm/shared/easybuild/software/bzip2/1.0.6- GCCcore-8.2.0/bin:/cm/shared/easybuild/software/GCCcore/8.2.0/bin:/cm/shared/apps/slurm/19.05.1/sbin:/cm/shared/apps/slurm/19.05.1/bin:/cm/local/apps/gcc/8.2.0/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/me/.local/bin:/home/me/bin
You will notice a similarity to the output of the
which command. In this case, there’s only one difference: the
/cm/shared/easybuild/software/Python/3.7.2-GCCcore-8.2.0/bin directory at the beginning.
When we used
module load Python/3.7.2-GCCcore-8.2.0, it added this directory to the beginning of our
$PATH. Let us examine what is there:
[me@nodelogin01 ~]$ ls /cm/shared/easybuild/software/Python/3.7.2-GCCcore-8.2.0/bin 2to3 futurize pip pytest python3-config rst2odt_prepstyles.py sphinx-apidoc 2to3-3.7 idle3 pip3 py.test pyvenv rst2odt.py sphinx-autogen chardetect idle3.7 pip3.7 python pyvenv-3.7 rst2pseudoxml.py sphinx-build cygdb netaddr pybabel python3 rst2html4.py rst2s5.py sphinx-quickstart cython nosetests __pycache__ python3.7 rst2html5.py rst2xetex.py tabulate cythonize nosetests-3.7 pydoc3 python3.7-config rst2html.py rst2xml.py virtualenv easy_install pasteurize pydoc3.7 python3.7m rst2latex.py rstpep2html.py wheel easy_install-3.7 pbr pygmentize python3.7m-config rst2man.py runxlrd.py
Taking this to its conclusion,
module load adds software to your
$PATH. It “loads” software.
A special note on this,
module load will also load required software dependencies. If you compare the output below with what you had when you first logged in to ALICE, you will notice several other modules have been load automatically, because the Python module depends on them.
[me@nodelogin01 ~]$ module list Currently Loaded Modules: 1) shared 5) GCCcore/8.2.0 9) libreadline/8.0-GCCcore-8.2.0 13) GMP/6.1.2-GCCcore-8.2.0 2) DefaultModules 6) bzip2/1.0.6-GCCcore-8.2.0 10) Tcl/8.6.9-GCCcore-8.2.0 14) libffi/3.2.1-GCCcore-8.2.0 3) gcc/8.2.0 7) zlib/1.2.11-GCCcore-8.2.0 11) SQLite/3.27.2-GCCcore-8.2.0 15) Python/3.7.2-GCCcore-8.2.0 4) slurm/19.05.1 8) ncurses/6.1-GCCcore-8.2.0 12) XZ/5.2.4-GCCcore-8.2.0
Also a note of warning: When you load several modules, it is possible that their dependencies can cause conflicts and problems later on. It is best to always check what other modules have been automatically loaded.
module unload “un-loads” a module. For the above example:
[me@nodelogin01 ~]$ module unload Python/3.7.2-GCCcore-8.2.0 [me@nodelogin01 ~]$ module list Currently Loaded Modules: 1) shared 5) GCCcore/8.2.0 9) libreadline/8.0-GCCcore-8.2.0 13) GMP/6.1.2-GCCcore-8.2.0 2) DefaultModules 6) bzip2/1.0.6-GCCcore-8.2.0 10) Tcl/8.6.9-GCCcore-8.2.0 14) libffi/3.2.1-GCCcore-8.2.0 3) gcc/8.2.0 7) zlib/1.2.11-GCCcore-8.2.0 11) SQLite/3.27.2-GCCcore-8.2.0 4) slurm/18.08.4 8) ncurses/6.1-GCCcore-8.2.0 12) XZ/5.2.4-GCCcore-8.2.0
Important: Currently, unloading a module does not unload its dependencies (as you can see from the above output).
If you want to remove all the modules that are currently loaded, you can use the command
[me@nodelogin01 ~]$ module purge [me@nodelogin01 ~]$ module list No modules loaded
Note that this command will also unload the modules loaded by default on login including Slurm. You can either manually load the modules back or source your bashrc with
File and I/O Management
The file system is one of the critical components to the service and users should aim to make the best use of the resources. This chapter details the different file systems in use, the best practices for reading and writing data (I/O), and basic housekeeping of your files.
Data Storage Policy
ALICE does not provide support for any type of controlled data. No controlled data (GDPR, HIPAA, EAR, FERPA, PII, CUI, ITAR, etc.) can be analysed or stored on any HPC storage. Users must not transfer sensitive data (data related to people) to ALICE. Data must be anonymized before it can be transferred to ALICE. In case you are unsure about the contents/classification of the data, please contact the helpdesk.
ALICE is not a datamanagement system where research data can be stored for longer periods of time. All data that is transferred to ALICE must be copies of data. Users must make sure that data that is transferred to ALICE remains available somewhere else. All data with value that is generated on ALICE must be moved off ALICE as soon as this is possible after the job completed.
Data in the user’s home directory is backed up (see Backup & Restore). The home directory is intended to store scripts, software, executables etc, but is not meant to store large or temporary data sets.
Summary of available file systems
|File system||Directory||Disk Quota||Speed||Shared between nodes||Expiration||Backup||Files removed?|
|Home||/home||15 GB||Normal||Yes||None||Nightly incremental||No|
|Local scratch||/scratchdata||10 TB||Fast||No||End of job||No||No automatic deletion currently|
|Scratch-shared||/data||(N.A. 57 TB)||Normal||Yes||At most 28 days||No||No automatic deletion currently|
|Normal||Yes||None||Nightly Incremental||N/A (not for user storage)|
The home file system
The home file system contains the files you normally use to run your jobs such as programmes, scripts, job files. There is only limited space available for each user which is why you should never store the data that you want to process or that your jobs will produce in your home.
By default, you have 15 GB disk space available. Your current usage is shown when you type in the command:
The home file system is a network file system (NFS) that is available on all login and compute nodes. Thus, your jobs can access the home file system from all nodes. The downside is that the home file system is not particularly fast, especially with the handling of metadata: creating and destroying of files; opening and closing of files; many small updates to files and so on.
For newly created accounts, the home directory only contains a link to your directory on scratch-shared.
In addition to the home file system that is shared among all nodes, you also need storage for large files that is shared among the nodes.
For this, we have a shared scratch disk accessible through
The size of this shared scratch space is currently 57 TB and there is no quota for individual users.
Consider the following properties of the shared scratch space when you use it for your jobs:
- The speed of
/datais similar to the home file system and thus slower than the local scratch disk (see next section).
- You share
/datawith all other users and there may not be enough space to write all the files you want. Thus, carefully think how your job will behave if it tries to write to
/dataand there is insufficient space: it would be a waste of budget if the results of a long computation are lost because of it.
- Because you share the disk space with all users, make sure that you store large amounts of data only as long as is necessary for completing your jobs.
The local scratch file system on
The scratch file system on
/scratchdata is a local file system on each node. It is intended as fast, temporary storage that can be used while running a job. The local scratch file system of a node can only be accessed when you run a job on that node.
There is no quota for the scratch file system, but use of it is eventually limited by the available storage space (see the Table in Summary of available file systems). Scratch disks are not backed up and are cleaned at the end of a job. This means that you have to move your data back to the shared scratch space at the end of your job or all your data will be lost.
Since the disks are local, read and write operations on
/scratchdata are much faster than on the home file system or the shared scratch file system. This makes it very suitable for I/O intensive operations.
How to best use local scratch
In general, accessing the local scratch file system on
/scratchdata should be incorporated into your job. For example, copy your input files from your directory on
/data to the local scratch at the start of a job, create all temporary files needed by your job on the local scratch (assuming they don't need to be shared with other nodes) and copy all output files at the end of a job back to your
There are two things to note:
- On the node that your job is running on, a directory will be created for you upon the start of a job. The directory name is
SLURM_JOB_USERis your ALICE username and
SLURM_JOB_IDis the id of the job. You do not have to define these two variables yourself. They will be available for your to use in your job script.
- Do not forget to copy your results back to
/data! The local scratch space will be cleaned and the directory will be removed after your job finishes and your results will be lost if you forget this step.
Software file system
The software file system provides a consistent set of software packages for the entire cluster. It is mounted at
/cm/shared on every node.
You do not need to access this file system directly, because we provide a much easier way of using avilable software. Also, as a user, you cannot change the content of this file system.
- We do nightly incremental backups of the home and software file system.
- Files that are open at the time of the backup will be skipped.
- We can restore files and/or directories when you accidentally remove them up to 15 days back, provided they already existed during the last successful backup.
- There is no backup for the shared scratch file system.
Each worker node has multiple file system mounts.
- /dev/shm - On each worker, you may also create a virtual file system directly into memory, for extremely fast data access. It is the fastest available file system, but be advised that this will count against the memory used for your job. The maximum size is set to half the physical RAM size of the worker node.
Your I/O activity can have dramatic effects on the peformance of you jobs and on other users. The general statement here is to ask for advice on improving your I/O activity if you are uncertain. The time spent can be saved many times over in faster job execution.
- Be aware of I/O load. If your workflow creates a lot of I/O activity then creating dozens of jobs doing the same thing may be detrimental.
- Avoid storing many files in a single directory. Hundreds of files is probably ok; tens of thousands is not.
- Avoid opening and closing files repeatedly in tight loops. If possible, open files once at the beginning of your workflow / program, then close them at the end.
- Watch your quotas. You are limited in capacity and file count. Use "uquota". In /home the scheduler writes files in a hidden directory assigned to you.
- Avoid frequent snapshot files which can stress the storage.
- Limit file copy sessions. You share the bandwidth with others. Two or three scp sessions are probably ok; >10 is not.
- Consolidate files. If you are transferring many small files consider collecting them in a tarball first.
- Use parallel I/O if available like "module load phdf5"
- Use local storage for working space. Us the local storage on each node for you data. This will improve the performance of your job and reduce I/O load on the shared file systems.