Current Status Overview

ALICE node status

Login nodes: OK
CPU nodes: OK
GPU nodes: OK
High-memory nodes: OK

Current Issues

  • SSH connection breaking up after a few minutes
    • We have received several reports that since last week ssh connections to ALICE are getting closed after a few minutes of being idle. This has not been the case before the 1 Feb.
    • Changes to the ssh gateway require the client to keep SSH connection alive. This can be achieved by using the ServerAliveInterval setting (e.g., "ServerAliveInterval 60") in your ssh config settings for ALICE.
    • Status: Potential solution posted. Waiting for user feedback
    • Last Updated: 12 Feb 2021, 15:45 CET
  • Slurm issue with ssh to compute nodes when more than one job is running:
    • The current slurm version has a bug which prevents users from logging into the compute node on which their job is running if two or more jobs are running on the node. We are looking into this.
    • If you try to log into a node which has more than job running you will see this error message: "Access denied by pam_slurm_adopt: you have no active jobs on this node Authentication failed."
    • If your job is the only one running on the node, ssh to the node should work without a problem.

