ALICE User Documentation Wiki
From ALICE Documentation
Welcome to the ALICE HPC user documentation.
ALICE is a computing facility for state-of-the-art research and education of Leiden University. With ALICE you have the world of computing at your fingertips. On this wiki, you can find the information you'll need to get started and become more skilled in using computing to support your research and education.
We appreciate any questions or comments on the content of the documentation so that we can improve the range of information that we supply here.
If you are unsure about where to go next, have a look below.
What is ALICE?
Please check out the About ALICE pages to get some background information, a quick overview and see how to acknowledge it.
What's new with ALICE?
To get information about updates, upgrades, events, planned maintenance and more, have a look at the News page.
Here are the most recent news:
- 25 Feb. 2021 - Next major maintenance window on 08 March 2021: Please have a look at the maintenance page for details on our planned work and how it affects you.
- 12 Feb. 2021 (Update 22 Feb. 2021) - SSH Connection Stability: If you recently started experiencing that your ssh connection is breaking up after a few minutes of being idle, please check the settings below for you ssh configuration for ALICE. If this does not solve the issue, please contact the ALICE Helpdesk.
- for Linux, MacOS, Windows using OpenSSH command line connection: Make sure you use "ServerAliveInterval 60" and "ServerAliveCountMax 3" to your ssh config settings.
- MobaXterm: Go to Settings -> SSH -> SSH settings and enable "SSH keepalive"
- PuTTY: Go to Settings -> Connection -> Set a non-0 value in "Settings between keepalives" (e.g., 60)
- 25 Jan. 2021 - Outlook for ALICE in 2021: We have updated the section outlining our expansions plans for ALICE in 2021 (Future plans). Two major items this year will be the addition of a new parallel file storage system and the expansion of the GPU nodes. But there is more on our agenda, so stay tuned...
- 08 Jan. 2021 - SURF HPC Workshops: SURF is offering HPC-related workshops on various topics. You can find a list of upcoming workshops (and more) on the SURF website (Link). Workshops of interest to HPC users are:
- Webinar Introduction Supercomputing
- Webinar Introduction HPC Cloud
- Using the Amsterdam Modeling Suite in HPC systems
- SURF Research Week
Just Getting Started?
If you're new to ALICE, please check out the User Guide.
What more can I do with ALICE?
If you already have experience with ALICE and/or HPC, have a look at the Advanced Guide pages. Please note that many of the pages here are still under construction and subject to change.
What else is there about ALICE?
If you need more information on general topics, such as hardware, storage, and policies, please take a look at the Documentation pages. Please note that many of the pages here are still under construction and subject to change.
Have a question or feedback on ALICE?
If you have a question about ALICE, need help with using it or want to give us some feedback, please see the Support page to know how you can connect with us.
Status of ALICE?
Would you like to know how busy ALICE is and if all nodes are up, then please have a look at the Current Status Overview.
This is a quick overview:
ALICE node status
Login nodes: OK CPU nodes: OK GPU nodes: OK High-memory nodes: OK
- SSH connection breaking up after a few minutes
- We have received several reports that since last week ssh connections to ALICE are getting closed after a few minutes of being idle. This has not been the case before the 1 Feb.
- Changes to the ssh gateway require the client to keep SSH connection alive. This can be achieved by using the ServerAliveInterval setting (e.g., "ServerAliveInterval 60") in your ssh config settings for ALICE.
- Status: Potential solution posted. Waiting for user feedback
- Last Updated: 12 Feb 2021, 15:45 CET
- Slurm issue with ssh to compute nodes when more than one job is running:
- The current slurm version has a bug which prevents users from logging into the compute node on which their job is running if two or more jobs are running on the node. We are looking into this.
- If you try to log into a node which has more than job running you will see this error message: "Access denied by pam_slurm_adopt: you have no active jobs on this node Authentication failed."
- If your job is the only one running on the node, ssh to the node should work without a problem.