Actions

Maintenance day 20201005

From ALICE Documentation

Revision as of 15:46, 5 October 2020 by Schulzrf (talk | contribs) (Cluster status)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Maintenance day on 05 October 2020

Maintenance will be performed on the entire cluster, i.e., it will be offline for the entire day. Please have a look below to see how this affects your jobs.

This is the current timeline for the maintenance day

  • Fri, 02 Oct. 2020 at 17:00 CEST: activate reservation of the cluster for Mon, 05 Oct. 2020 at 17:00 CEST. It will still be possible to submit and run new jobs over the weekend as long as they won't take longer than the starting time of the reservation.
  • Mon, 05 Oct 2020 at 08:00 CEST: any jobs that are still running will be cancelled. Therefore, you should make sure that your jobs will have finished until then.
  • Tue, 06 Oct. 2020: all nodes should be back running and the cluster will be available to you again. We expect that jobs that were in the queue but not yet scheduled will remain there until the cluster is back online, but we cannot guarantee this. Therefore, please check if your job is still in the queue when the cluster is back online.

Current To-Do list for the maintenance day

  • Update NFS (storage) server: Done
  • Update OS images on all nodes: Done
    • login
    • cpu
    • high-memory
    • gpu
  • Update EasyBuild to version 4.3.0: Done
  • Update Slurm to version 19.05.7-1: Done

This list is subject to change especially on the maintenance day.

Cluster status

Cluster is back and running

If you notice any issues, please contact the helpdesk