Data storage

From ALICE Documentation

Revision as of 09:35, 14 October 2020 by Deuler (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

File and I/O Management

The file system is one of the critical components to the service and users should aim to make the best use of the resources. This chapter details the different file systems in use, the best practices for reading and writing data (I/O), and basic housekeeping of your files.

Data Storage Policy

ALICE does not provide support for any type of controlled data. No controlled data (GDPR, HIPAA, EAR, FERPA, PII, CUI, ITAR, etc.) can be analysed or stored on any HPC storage. Users must not transfer sensitive data (data related to people) to ALICE. Data must be anonymized before it can be transferred to ALICE. In case you are unsure about the contents/classification of the data, please contact the helpdesk.

ALICE is not a datamanagement system where research data can be stored for longer periods of time. All data that is transferred to ALICE must be copies of data. Users must make sure that data that is transferred to ALICE remains available somewhere else. All data with value that is generated on ALICE must be moved off ALICE as soon as this is possible after the job completed.

Data in the user’s home directory is backed up (see Backup & Restore). The home directory is intended to store scripts, software, executables etc, but is not meant to store large or temporary data sets.

Summary of available file systems

File system Directory Disk Quota Speed Shared between nodes Expiration Backup Files removed?
Home /home 15 GB Normal Yes None Nightly incremental No
Local scratch /scratchdata 10 TB Fast No End of job No No automatic deletion currently
Scratch-shared /data (N.A. 57 TB) Normal Yes At most 28 days No No automatic deletion currently
Cluster-wide software /cm/shared
N/A (not for user storage)
Normal Yes None Nightly Incremental N/A (not for user storage)

The home file system

The home file system contains the files you normally use to run your jobs such as programmes, scripts, job files. There is only limited space available for each user which is why you should never store the data that you want to process or that your jobs will produce in your home.

By default, you have 15 GB disk space available. Your current usage is shown when you type in the command:

quota -s

The home file system is a network file system (NFS) that is available on all login and compute nodes. Thus, your jobs can access the home file system from all nodes. The downside is that the home file system is not particularly fast, especially with the handling of metadata: creating and destroying of files; opening and closing of files; many small updates to files and so on.

For newly created accounts, the home directory only contains a link to your directory on scratch-shared.

The scratch-shared file system on /data

In addition to the home file system that is shared among all nodes, you also need storage for large files that is shared among the nodes.

For this, we have a shared scratch disk accessible through

cd /data or cd ~/data

The size of this shared scratch space is currently 57 TB and there is no quota for individual users.

Consider the following properties of the shared scratch space when you use it for your jobs:

  • The speed of /data is similar to the home file system and thus slower than the local scratch disk (see next section).
  • You share /data with all other users and there may not be enough space to write all the files you want. Thus, carefully think how your job will behave if it tries to write to /data and there is insufficient space: it would be a waste of budget if the results of a long computation are lost because of it.
  • Because you share the disk space with all users, make sure that you store large amounts of data only as long as is necessary for completing your jobs.

The local scratch file system on /scratchdata

The scratch file system on /scratchdata is a local file system on each node. It is intended as fast, temporary storage that can be used while running a job. The local scratch file system of a node can only be accessed when you run a job on that node.

There is no quota for the scratch file system, but use of it is eventually limited by the available storage space (see the Table in Summary of available file systems). Scratch disks are not backed up and are cleaned at the end of a job. This means that you have to move your data back to the shared scratch space at the end of your job or all your data will be lost.

Since the disks are local, read and write operations on /scratchdata are much faster than on the home file system or the shared scratch file system. This makes it very suitable for I/O intensive operations.

How to best use local scratch

In general, accessing the local scratch file system on /scratchdata should be incorporated into your job. For example, copy your input files from your directory on /home or /data to the local scratch at the start of a job, create all temporary files needed by your job on the local scratch (assuming they don't need to be shared with other nodes) and copy all output files at the end of a job back to your /home or /data directory.

There are two things to note:

  • On the node that your job is running on, a directory will be created for you upon the start of a job. The directory name is /scratchdata/${SLURM_JOB_USER}/${SLURM_JOB_ID} where SLURM_JOB_USER is your ALICE username and SLURM_JOB_ID is the id of the job. You do not have to define these two variables yourself. They will be available for your to use in your job script.
  • Do not forget to copy your results back to /home or /data! The local scratch space will be cleaned and the directory will be removed after your job finishes and your results will be lost if you forget this step.

Software file system

The software file system provides a consistent set of software packages for the entire cluster. It is mounted at /cm/shared on every node.

You do not need to access this file system directly, because we provide a much easier way of using avilable software. Also, as a user, you cannot change the content of this file system.

  • We do nightly incremental backups of the home and software file system.
  • Files that are open at the time of the backup will be skipped.
  • We can restore files and/or directories when you accidentally remove them up to 15 days back, provided they already existed during the last successful backup.
  • There is no backup for the shared scratch file system.

Compute Local

Each worker node has multiple file system mounts.

  • /dev/shm - On each worker, you may also create a virtual file system directly into memory, for extremely fast data access. It is the fastest available file system, but be advised that this will count against the memory used for your job. The maximum size is set to half the physical RAM size of the worker node.

Best Practices - Shared File System

Your I/O activity can have dramatic effects on the peformance of you jobs and on other users.  The general statement here is to ask for advice on improving your I/O activity if you are uncertain.  The time spent can be saved many times over in faster job execution.

  1. Be aware of I/O load. If your workflow creates a lot of I/O activity then creating dozens of jobs doing the same thing may be detrimental.
  2. Avoid storing many files in a single directory. Hundreds of files is probably ok; tens of thousands is not.
  3. Avoid opening and closing files repeatedly in tight loops.  If possible, open files once at the beginning of your workflow / program, then close them at the end.
  4. Watch your quotas.  You are limited in capacity and file count. Use "uquota". In /home the scheduler writes files in a hidden directory assigned to you.
  5. Avoid frequent snapshot files which can stress the storage.
  6. Limit file copy sessions. You share the bandwidth with others.  Two or three scp sessions are probably ok; >10 is not.
  7. Consolidate files. If you are transferring many small files consider collecting them in a tarball first.
  8. Use parallel I/O if available like "module load phdf5"
  9. Use local storage for working space. Us the local storage on each node for you data. This will improve the performance of your job and reduce I/O load on the shared file systems.