ALICE User Documentation Wiki

From ALICE Documentation

Revision as of 22:52, 3 May 2021 by Kosterj1 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Off to research computing Wonderland

Welcome to the ALICE HPC user documentation.

ALICE is a computing facility for research and education of Leiden University. With ALICE you have the world of computing at your fingertips. On this wiki, you find the information you need to get started and become more skilled in using a compute cluster for research and education.

We appreciate any questions and comments on the content of the documentation so that we can improve the information that we supply here.

If you are unsure about where to go next, have a look below.

What is ALICE?

The About ALICE pages give some background information, a quick overview and how to acknowledge ALICE in your publications.

How can I get an account?

The page Getting an Account explains how to request an account on ALICE.

What's new with ALICE?

To get information about updates, upgrades, events, planned maintenance and more, have a look at the News page.

Here is the most recent news:

Latest News

  • 2 Jun. 2021 - Rclone available on ALICE: Rclone is available on ALICE and there are instructions on how to set it up to transfer files to and from SurfDrive and ResearchDrive: Data transfer to and from ALICE. This is a new feature and feedback on your experience is very welcome.
  • 29 Apr. 2021 - ALICE User Survey 2021 closed: The ALICE User Survey 2021 is closed. We have received responses from 76 users. We are thrilled to have this many contributions. Thank you very much for participating in the survey. We will go through all the answers now and share results from the survey here on the wiki with you.
  • 12 Feb. 2021 (Update 22 Feb. 2021) - SSH Connection Stability: If you recently started experiencing that your ssh connection is breaking up after a few minutes of being idle, please check the settings below for you ssh configuration for ALICE. If this does not solve the issue, please contact the ALICE Helpdesk.
    • for Linux, MacOS, Windows using OpenSSH command line connection: Make sure you use "ServerAliveInterval 60" and "ServerAliveCountMax 3" to your ssh config settings.
    • MobaXterm: Go to Settings -> SSH -> SSH settings and enable "SSH keepalive"
    • PuTTY: Go to Settings -> Connection -> Set a non-0 value in "Settings between keepalives" (e.g., 60)

Just Getting Started?

If you're new to ALICE, please check out the User Guide.

What more can I do with ALICE?

If you already have experience with ALICE and/or HPC, have a look at the Advanced Guide pages. Please note that many of the pages here are still under construction and subject to change.

What else is there about ALICE?

If you need more information on general topics, such as hardware, storage, and policies, take a look at the Documentation pages. Please note that many of the pages here are still under construction and subject to change.

Have a question or feedback on ALICE?

If you have a question about ALICE, need help with using it or want to give us some feedback, see the Support page to know how you can connect with us.

Status of ALICE?

Would you like to know how busy ALICE is and if all nodes are up, then have a look at the Current Status Overview.

This is a quick overview:

ALICE node status

Gateway: OK
Login nodes: OK
CPU nodes: OK
GPU nodes: OK
High-memory nodes: OK

Current Issues

  • SSH Keys on new Gateway:
    • We have received multiple reports that ssh keys are not working properly on the new gateway because of bad permissions. This issue seems to affect some users but not all. We are looking into it.
    • The permissions in the home directories should be fixed now. Please log in and log out if you have been logged so far. Should you still encounter issues, please contact the ALICE Helpdesk
    • Status: SOLVED
    • Last Updated: 27 May 2021, 09:33 CEST
  • Logging in to ALICE ssh gateway:
    • We are experiencing issues with logging in to the ALICE gateway. We are looking into it.
    • It is very likely that you are unable to login. You might be prompted for a password even though you have set up ssh keys and your correct password is rejected.
    • We are deploying a new gateway and tests indicate that it is working properly. We are working on completing the setup so that existing keys continue to work. Once this has been verified, we will switch to the new server.
    • The issue with connecting to the ALICE ssh gateway has been resolved. A new gateway has been deployed and all keys were transferred to the new gateway
    • We have also changed the domain to point to the IP of the new server. The domain should be resolved properly by now, i.e., it should use the IP of the new server. In case it is not and you still get connected to the old gateway, you can either wait a bit longer or if you are in a hurry, you can replace the domain "" by the IP of the new gateway:
    • The new gateway was temporarily not available from outside the University Leiden network. This has been resolved.
    • Status: SOLVED.
    • Last Updated: 26 May 2021, 17:00 CEST
  • Copying data to the shared scratch via sftp:
    • There is currently an issue on the sftp gateway which does prevents users from copying data to their shared scratch directory, i.e., /home/<username>/data
    • A current work-around is to use scp or sftp via the ssh gateway and the login nodes.
    • Status: Work in Progress
    • Last Updated: 19 Apr 2021, 12:17 CET
  • Slurm issue with ssh to compute nodes when more than one job is running:
    • The current slurm version has a bug which prevents users from logging into the compute node on which their job is running if two or more jobs are running on the node. We are looking into this.
    • If you try to log into a node which has more than job running you will see this error message: "Access denied by pam_slurm_adopt: you have no active jobs on this node Authentication failed."
    • If your job is the only one running on the node, ssh to the node should work without a problem.

See here for other recently solved issues: Solved Issues