Actions

Modules

From ALICE Documentation

Revision as of 12:26, 1 May 2020 by Dijkbvan (talk | contribs)

Modules

Software installation and maintenance on a HPC cluster such as the VSC clusters poses a number of challenges not encountered on a workstation or a departmental cluster. We therefore need a system on the HPC, which is able to easily activate or deactivate the software packages that you require for your program execution.

The documentation in this “Running batch jobs” section includes a description of the general features of job scripts, how to submit them for execution and how to monitor their progress.

Environment Variables

The program environment on the HPC is controlled by pre-defined settings, which are stored in environment (or shell) variables. For more information about environment variables, see the chapter “Getting started”, section “Variables” in the intro to Linux. All the software packages that are installed on the HPC cluster require different settings. These packages include compilers, interpreters, mathematical software such as MATLAB and SAS, as well as other applications and libraries.

The module command

In order to administer the active software and their environment variables, the module system has been developed, which:

  1. Activates or deactivates software packages and their dependencies.
  2. Allows setting and unsetting of environment variables, including adding and deleting entries from list-like environment variables.
  3. Does this in a shell-independent fashion (necessary information is stored in the accompanying module file).
  4. Takes care of versioning aspects: For many libraries, multiple versions are installed and maintained. The module system also takes care of the versioning of software packages. For instance, it does not allow multiple versions to be loaded at same time.
  5. Takes care of dependencies: Another issue arises when one considers library versions and the dependencies they require. Some software requires an older version of a particular library to run correctly (or at all). Hence a variety of version numbers is available for important libraries. Modules typically load the required dependencies automatically.
  6. This is all managed with the module command, which is explained in the next sections. There is also a shorter ml command that does exactly the same as the module command and is easier to type. Whenever you see a module command, you can replace module with ml.

Available modules

A large number of software packages are installed on the HPC clusters. A list of all currently available software can be obtained by typing:

 $ module available

It’s also possible to execute module av or module avail, these are shorter to type and will do the same thing.

This will give some output such as:

 $ module av 2>&1 | more 
 --- /apps/gent/SL6/sandybridge/modules/all --
 ABAQUS/6.12.1-linux-x86_64 
 AMOS/3.1.0-ictce-4.0.10 ant/1.9.0-Java-1.7.0_40 
 ASE/3.6.0.2515-ictce-4.1.13-Python-2.7.3 
 ASE/3.6.0.2515-ictce-5.5.0-Python-2.7.6 ...

Or when you want to check whether some specific software, some compiler or some application (e.g., MATLAB) is installed on the HPC.

 $ module av 2>&1 | grep -i -e "matlab" 
 MATLAB/2010b 
 MATLAB/2012b 
 MATLAB/2013b

As you are not aware of the capitals letters in the module name, we looked for a case-insensitive name with the “-i” option. This gives a full list of software packages that can be loaded. The casing of module names is important: lowercase and uppercase letters matter in module names.

Organisation of modules in toolchains

The number of modules on Alice can be overwhelming, and it is not always immediately clear which modules can be loaded safely together if you need to combine multiple programs in a single job to get your work done. ThereforetheVSChasdefinedso-called toolchains. A toolchain contains a C/C++and Fortran compiler, an MPI library and some basic math libraries for (dense matrix) linear algebra and FFT. Two toolchains are defined on most VSC systems. One, the intel toolchain, consists of the Intel compilers, MPI library and math libraries. The other one, the foss toolchain, consists of Open Source components: the GNU compilers, OpenMPI, OpenBLAS and the standard LAPACK and ScaLAPACK libraries for the linear algebra operations and the FFTW library for FFT. The toolchains are refreshed twice a year, which is reflected in their name. E.g., foss/2020a is the first version of the foss toolchain in 2020. The toolchains are then used to compile a lot of the software installed on the VSC clusters. You can recognise those packages easily as they all contain the name of the toolchain after the version number in their name (e.g., Python/2.7.12-intel-2016b). Only packages compiled with the same toolchain name and version can work together without conflicts.

Loading and unloading modules

module load

To“activate” a software package, you load the corresponding module file using the module load command:

 $ module load example

This will load the most recent version of example. For some packages, multiple versions are installed; the load command will automatically choose the default version (if it was set by the system administrators) or the most recent version otherwise (i.e., the lexicographical last after the /). However, you should specify a particular version to avoid surprises when newer versions are installed:

 $ module load secondexample/2.7-intel-2016b

The ml command is a shorthand for module load: ml example/1.2.3 is equivalent to module load example/1.2.3. Modules need not be loaded one by one; the two module load commands can be combined as follows:

 $ module load example/1.2.3 secondexample/2.7-intel-2016b

This will load the two modules as well as their dependencies (unless there are conflicts between both modules).

module list

Obviously, you need to be able to keep track of the modules that are currently loaded. Assuming you have run the module load commands stated above, you will get the following:

 $ module list 
 Currently Loaded Modulefiles: 
 1) example/1.2.3                                                            6) imkl/11.3.3.210-iimpi -2016b 
 2) GCCcore/5.4.0                                                            7) intel/2016b 
 3) icc/2016.3.210-GCC-5.4.0-2.26                                            8) examplelib/1.2-intel -2016b 
 4) ifort/2016.3.210-GCC-5.4.0-2.26                                          9) secondexample/2.7-intel -2016b 
 5) impi/5.1.3.181-iccifort-2016.3.210-GCC-5.4.0-2.26

You can also just use the ml command without arguments to list loaded modules. It is important to note at this point that other modules (e.g., intel/2016b) are also listed, although the user did not explicitly load them. This is because secondexample/2.7-intel -2016b depends on it (as indicated in its name), and the system administrator specified that the intel/2016b module should be loaded whenever this second example module is loaded. There are advantages and disadvantages to this, so be aware of automatically loaded modules whenever things go wrong: they may have something to do with it!

module unload

To unload a module, one can use the module unload command. It works consistently with the load command and reverses the latter’s effect. However, the dependencies of the package are NOT automatically unloaded; you will have to unload the packages one by one. When the second example module is unloaded, only the following modules remain:

 $ module unload secondexample 
 $ module list Currently Loaded Modulefiles: 
 Currently Loaded Modulefiles: 
 1) example/1.2.3                                                             5) impi/5.1.3.181-iccifort -2016.3.210-GCC-5.4.0-2.26  
 2) GCCcore/5.4.0                                                             6) imkl/11.3.3.210-iimpi -2016b 
 3) icc/2016.3.210-GCC-5.4.0-2.26                                             7) intel/2016b 
 4) ifort/2016.3.210-GCC-5.4.0-2.26                                           8) examplelib/1.2-intel -2016b

To unload the second example module, you can also use ml -second example.

Notice that the version was not specified: there can only be one version of a module loaded at a time, so unloading modules by name is not ambiguous. However, checking the list of currently loaded modules is always a good idea, since unloading a module that is currently not loaded will not result in an error.

Purging all modules

In order to unload all modules at once, and hence be sure to start in a clean state, you can use:

 $ module purge

This is always safe: the cluster module (the module that specifies which cluster jobs will get submitted to) will not be unloaded (because it’s a so-called “sticky” module).

Using explicit version numbers

Once a module has been installed on the cluster, the exe cutables or libraries it comprises are never modified. This policy ensures that the user’s programs will run consistently, at least if the user specifies a specific version. Failing to specify a version may result in unexpected behaviour. Consider the following example: the user decides to use the example module and at that point in time, just a single version 1.2.3 is installed on the cluster. The user loads the module using:

 $ module load example

rather than

 $ module load example/1.2.3

Everything works fine, up to the point where a new version of example is installed, 4.5.6. From then on, the user’s load command will load the latter version, rather than the intended one, which may lead to unexpected problems. See for example section 8.8.

Consider the following example modules:

 $ module avail example/ 
 example/1.2.3 
 example/4.5.6

Let’s now generate a version conflict with the example module, and see what happens.

 $ module av example/ 
 example/1.2.3        example/4.5.6 
 $ module load example/1.2.3 example/4.5.6 
 Lmod has detected the following error: A different version of the ’example’ module is already loaded (see output of ’ml’). 
 $ module swap example/4.5.6

Note: A module swap command combines the appropriate module unload and module load commands.

Search for modules

With the module spider command, you can search for modules:

 $ module spider example 
 -------------------------------------------------------------------------------
 example: 
 -------------------------------------------------------------------------------
     Description: 
        This is just an example
     Versions: 
        example/1.2.3 
        example/4.5.6 
 -------------------------------------------------------------------------------
  For detailed information about a specific "example" module (including how to load the modules) use the module’s full name. 
  For example:
       module spider example/1.2.3 
 -------------------------------------------------------------------------------
 

It’s also possible to get detailed information about a specific module:

 $ module spider example/1.2.3
 -----------------------------------------------------------------------------------------
 example: example/1.2.3 
 -----------------------------------------------------------------------------------------
    Description: 
      This is just an example
    You will need to load all module(s) on any one of the lines below before the " example/1.2.3" module is available to load.
    cluster/golett 
    cluster/phanpy 
    cluster/swalot  
    cluster/skitty 
    cluster/victini
 Help:
 Description 
 =========== 
 This is just an example
 More information 
 ================ 
 - Homepage: https://example.com

Get detailed info

To get a list of all possible commands, type:

 $ module help

Or to get more information about one specific module package:

 $ module help example/1.2.3 
 ----------- Module Specific Help for ’example/1.2.3’ --------------------------
   This is just an example - Homepage: https://example.com/

Save and load collections of modules

If you have a set of modules that you need to load often, you can save these in a collection. This will enable you to load all the modules you need with a single command. In each module command shown below, you can replace module with ml.

First, load all modules you want to include in the collections:

 $ module load example/1.2.3 secondexample/2.7-intel-2016b

Now store it in a collection using module save. In this example, the collection is named my-collection.

 $ module save my-collection

Later, for example in a jobscript or a new session, you can load all these modules with module restore:

 $ module restore my-collection

You can get a list of all your saved collections with the module savelist command:

 $ module savelistr Named collection list (For LMOD_SYSTEM_NAME = "CO7-sandybridge"): 1) my-collection

To get a list of all modules a collection will load, you can use the module describe command:

 $ module describe my-collection 
 1) example/1.2.3                                                    6) imkl/11.3.3.210-iimpi -2016b 
 2) GCCcore/5.4.0                                                    7) intel/2016b                                     
 3) icc/2016.3.210-GCC-5.4.0-2.26                                    8) examplelib/1.2-intel -2016b 
 4) ifort/2016.3.210-GCC-5.4.0-2.26                                  9) secondexample/2.7-intel -2016b 
 5) impi/5.1.3.181-iccifort-2016.3.210-GCC-5.4.0-2.26

To remove a collection, remove the corresponding file in $HOME/.lmod.d:

 $ rm $HOME/.lmod.d/my-collection

Getting module details

To see how a module would change the environment, you can use the module show command:

 $ module show Python/2.7.12-intel-2016b 
 whatis("Description: Python is a programming language that lets you work more quickly and integrate your systems more effectively. - Homepage: http://python. org/ ") 
 conflict("Python") 
 load("intel/2016b") 
 load("bzip2/1.0.6-intel-2016b") 
 ... 
 prepend_path(...) 
 setenv("EBEXTSLISTPYTHON","setuptools-23.1.0,pip-8.1.2,nose-1.3.7,numpy-1.11.1,scipy -0.17.1,ytz-2016.4", ...)

It’s also possible to use the ml show command instead: they are equivalent. Here you can see that the Python/2.7.12-intel-2016b comes with a whole bunch of extensions: numpy, scipy, ... YoucanalsoseethemodulesthePython/2.7.12-intel-2016bmoduleloads: intel/2016 b, bzip2/1.0.6-intel-2016b, ... If you’re not sure what all of this means: don’t worry, you don’t have to know; just load the module and try to use the software.