Current Status Overview

From ALICE Documentation

Revision as of 15:03, 2 February 2021 by Dijkbvan (talk | contribs) (ALICE usage statistics past 4 hours)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

ALICE node status

Login nodes: OK
CPU nodes: OK
GPU nodes: OK
High-memory nodes: OK

Current Issues

  • SSH connection breaking up after a few minutes
    • We have received several reports that since last week ssh connections to ALICE are getting closed after a few minutes of being idle. This has not been the case before the 1 Feb.
    • Changes to the ssh gateway require the client to keep SSH connection alive. This can be achieved by using the ServerAliveInterval setting (e.g., "ServerAliveInterval 60") in your ssh config settings for ALICE.
    • Status: Potential solution posted. Waiting for user feedback
    • Last Updated: 12 Feb 2021, 15:45 CET
  • Slurm issue with ssh to compute nodes when more than one job is running:
    • The current slurm version has a bug which prevents users from logging into the compute node on which their job is running if two or more jobs are running on the node. We are looking into this.
    • If you try to log into a node which has more than job running you will see this error message: "Access denied by pam_slurm_adopt: you have no active jobs on this node Authentication failed."
    • If your job is the only one running on the node, ssh to the node should work without a problem.

ALICE usage statistics past 4 hours

  • Cluster Load

  • Number of running processes