File storage

From ALICE Documentation

File systems on ALICE


File system Name Quota Speed Shared between nodes Expiration Backup
Home /home 100 GB Normal Yes None Nightly incremental
Scratch /scratchdata 10 TB Fast No End of job No
Scratch-shared /data (N.A. 57 TB) Normal Yes At most 28 days No
Software /cm/shared (N.A. read-only) Normal Yes None Nightly Incremental

The home file system

The home file system contains the files you normally use. By default, you have a quotum of 100 GB. Your current usage is shown when you type in the command:

quota -h 

The home file system is a network file system (NFS) that is available on all login and compute nodes. Thus, your jobs can access the home file system from all nodes. The downside is that the home file system is not particularly fast, especially with the handling of meta data: creating and destroying of files; opening and closing of files; many small updates to files and so on.

Backup & restore
  • We do nightly incremental backups.
  • Files that are open at the time of backup will be skipped.
  • We can restore files and/or directories when you accidentally remove them up to 15 days back, provided they already existed during the last successful backup.

The scratch file system

The scratch file system is intended as fast, temporary storage that can be used while running a job. Every compute node in the Lisa system contains a local disk for the scratch file system that can only be accessed by that particular node. There is no quotum for the scratch file system; use of the scratch file system is eventually limited by the capacity of these disks (see the description of the ALICE system). Scratch disks are not backed up and are cleaned at the end of a job.

Since the disks are local, read and write operations on to the scratch file system are much faster than on the home file system. This makes the scratch file system very suitable for I/O intensive operations.

How to best use scratch

In general, the best way to use scratch is to copy your input files from your home or data to scratch at the start of a job, create all temporary files needed by your job on scratch (assuming they don't need to be shared with other nodes) and copy all output files at the end of a job back to the home file system. There are two things to note:

  • A directory will be created for you upon start of a job on a compute node. The directory name is /scratchdata/${SLURM_JOB_USER}/${SLURM_JOB_ID} where SLURM_JOB_USER is your ALICE username and SLURM_JOB_ID.
  • Don't forget to copy your results back to the home or data file system! Scratch will be cleaned and the directory will be removed after your job finishes and your results will be lost if you forget this step.

The scratch-shared file system

In addition to temporary storage that is local to each node (like scratch), you will need some temporary storage that is shared among nodes. For this we have a shared scratch disk accessible through

cd /data or cd ~/data

The size of this shared scratch space is currently 1 TB and there is no quotum for individual users. Note that this shared scratch has two disadvantages compared to the local scratch disk

  • The speed of /data is similar to the home file system and thus slower than the local scratch disk at /scratchdata/${SLURM_JOB_USER}/${SLURM_JOB_ID}.
  • You share /data with all other users and there may not be enough space to write all the files you want. Thus, carefully think how your job will behave if it tries to write to /data and there is insufficient space: it would be a waste of budget if the results of a long computation are lost because of it.

Software file system

  • /cm/shared - This mount provides a consistent set of binaries for the entire cluster.

Compute Local

Each worker node has multiple filesystem mounts.

  • /dev/shm - On each worker you may also create a virtual filesystem directly into memory, for extremely fast data access. It is the fastest available filesystem, but be advised that this will count against the memory used for your job. The maximum size is set to half the physical RAM size of the worker node.