ALICE node status

From ALICE Documentation

Revision as of 14:48, 26 May 2021 by Deuler (talk | contribs) (Current Issues)

ALICE node status

Gateway: OK
Login nodes: OK
CPU nodes: OK
GPU nodes: OK
High-memory nodes: OK

Current Issues

  • Logging in to ALICE ssh gateway:
    • We are experiencing issues with logging in to the ALICE gateway. We are looking into it.
    • It is very likely that you are unable to login. You might be prompted for a password even though you have set up ssh keys and your correct password is rejected.
    • We are deploying a new gateway and tests indicate that it is working properly. We are working on completing the setup so that existing keys continue to work. Once this has been verified, we will switch to the new server.
    • The issue with connecting to the ALICE ssh gateway has been resolved. A new gateway has been deployed and all keys were transferred to the new gateway
    • We have also changed the domain to point to the IP of the new server. However, it might take a few hours for you the domain will be resolved correctly. In the meantime, if you are in a hurry, you can replace the domain ''" by the IP of the new gateway:
    • Status: ALMOST SOLVED, we are still waiting for the UL firewall change to allow access to the new IP number.
    • Last Updated: 26 May 2021, 14:47 CEST
  • Copying data to the shared scratch via sftp:
    • There is currently an issue on the sftp gateway which does prevents users from copying data to their shared scratch directory, i.e., /home/<username>/data
    • A current work-around is to use scp or sftp via the ssh gateway and the login nodes.
    • Status: Work in Progress
    • Last Updated: 19 Apr 2021, 12:17 CET
  • Slurm issue with ssh to compute nodes when more than one job is running:
    • The current slurm version has a bug which prevents users from logging into the compute node on which their job is running if two or more jobs are running on the node. We are looking into this.
    • If you try to log into a node which has more than job running you will see this error message: "Access denied by pam_slurm_adopt: you have no active jobs on this node Authentication failed."
    • If your job is the only one running on the node, ssh to the node should work without a problem.

See here for other recently solved issues: Solved Issues