Current Status Overview

From ALICE Documentation

Revision as of 13:04, 2 February 2021 by Dijkbvan (talk | contribs)

ALICE node status

Gateway: OK
Login nodes: OK
CPU nodes: OK
GPU nodes: OK
High-memory nodes: OK

Current Issues

  • SSH Keys on new Gateway:
    • We have received multiple reports that ssh keys are not working properly on the new gateway because of bad permissions. This issue seems to affect some users but not all. We are looking into it.
    • The permissions in the home directories should be fixed now. Please log in and log out if you have been logged so far. Should you still encounter issues, please contact the ALICE Helpdesk
    • Status: SOLVED
    • Last Updated: 27 May 2021, 09:33 CEST
  • Logging in to ALICE ssh gateway:
    • We are experiencing issues with logging in to the ALICE gateway. We are looking into it.
    • It is very likely that you are unable to login. You might be prompted for a password even though you have set up ssh keys and your correct password is rejected.
    • We are deploying a new gateway and tests indicate that it is working properly. We are working on completing the setup so that existing keys continue to work. Once this has been verified, we will switch to the new server.
    • The issue with connecting to the ALICE ssh gateway has been resolved. A new gateway has been deployed and all keys were transferred to the new gateway
    • We have also changed the domain to point to the IP of the new server. The domain should be resolved properly by now, i.e., it should use the IP of the new server. In case it is not and you still get connected to the old gateway, you can either wait a bit longer or if you are in a hurry, you can replace the domain "" by the IP of the new gateway:
    • The new gateway was temporarily not available from outside the University Leiden network. This has been resolved.
    • Status: SOLVED.
    • Last Updated: 26 May 2021, 17:00 CEST
  • Copying data to the shared scratch via sftp:
    • There is currently an issue on the sftp gateway which does prevents users from copying data to their shared scratch directory, i.e., /home/<username>/data
    • A current work-around is to use scp or sftp via the ssh gateway and the login nodes.
    • Status: Work in Progress
    • Last Updated: 19 Apr 2021, 12:17 CET
  • Slurm issue with ssh to compute nodes when more than one job is running:
    • The current slurm version has a bug which prevents users from logging into the compute node on which their job is running if two or more jobs are running on the node. We are looking into this.
    • If you try to log into a node which has more than job running you will see this error message: "Access denied by pam_slurm_adopt: you have no active jobs on this node Authentication failed."
    • If your job is the only one running on the node, ssh to the node should work without a problem.

See here for other recently solved issues: Solved Issues

ALICE usage statistics past 4 hours

  • Cluster Load

  • Number of running processes