More on login nodes
From ALICE Documentation
ALICE has several login nodes, for reliability and to share the interactive workload. For security reasons it is necessary to use SSH to connect - any Linux distribution will contain the ssh client which can be used for interactive logins, and also SCP, SFTP which employ SSH for file transfers. If your Linux/Mac OS/UNIX machine does not have ssh, firstly please check your distribution for it (ensuring you get the most recent version available). Alternatively, download and build OpenSSH yourself, or get MobaXtrem for Microsoft Windows.
Login nodes are not intended for production workload. Any large or long-lasting job is liable to be terminated by an automatic script.
Know when you are on a login node. You can use your Linux prompt or the command
hostname. This will tell you the name of the login node that you are currently on. Note that the ssh gateway host itself is a secure portal from the outside and serves no compute function.
- Appropriate activities on the login nodes:
- Compile code, Developing applications,
- Defining and submitting your job,
- Post-processing and managing data,
- Monitoring running applications.
- Avoid computationally intensive activity on the login nodes.
- Don't run research applications. Use an interactive session if running a job is not appropriate.
- Don't launch too many simultaneous processes. While it is fine to compile on a login node, avoid using all of the resources. For example "make -j 14" will use half of the cores.
- That script you run to monitor job status several times a second should probably run every few minutes.
- I/O activity can slow the login node for everyone, like multiple copies or "ls -l" on directories with 000's of files.
- Hyperthreading is turned off. Running multiple threads per core is generally not productive. MKL is an exception to that if it is relevant to you.
The cluster has two login nodes, also called head nodes. These are the nodes to which the users of ALICE can log in. These login nodes can be used to develop your HPC code and test/debug the programs. From the login nodes, you initiate the calls to the Slurm queuing system, spawning your compute jobs. The login nodes are also used to transfer data between the ALICE storage device and the university research storage data stores.
The login nodes have the following configuration:
|1||Huawei FusionServer 2288H V5|
|2||Xeon Gold 6126 2.6GHz 12 core||hyperthreading disabled|
|2||240 GB SSD||RAID 1 (OS disk)|
|3||8 TB SATA||RAID 5 (data disk)|
|1||Mellanox ConnectX-5 (EDR)||Infiniband|