Maintenance day 20201005

From ALICE Documentation

Revision as of 15:03, 30 September 2020 by Schulzrf (talk | contribs) (Maintenance day on 05 October 2020)

Maintenance day on 05 October 2020

Maintenance will be performed on the entire cluster, i.e., it will be offline for the entire day. Please have a look below to see how this affects your jobs.

This is the current timeline for the maintenance day

  • Fri, 02 Oct. 2020 at 17:00 CEST: activate reservation of the cluster for Mon, 05 Oct. 2020 at 17:00 CEST. It will still be possible to submit and run new jobs over the weekend as long as they won't take longer than the starting time of the reservation.
  • Mon, 05 Oct 2020 at 08:00 CEST: any jobs that are still running will be cancelled. Therefore, you should make sure that your jobs will have finished until then.
  • Tue, 06 Oct. 2020: all nodes should be back running and the cluster will be available to you again. We expect that jobs that were in the queue but not yet scheduled will remain there until the cluster is back online, but we cannot guarantee this. Therefore, please check if your job is still in the queue when the cluster is back online.

Current To-Do list for the maintenance day

  • Update OS images on all nodes (login nodes, cpu, high-memory and gpu nodes)
  • Update NFS (storage) server
  • Update EasyBuild to version 4.3.0
  • Update Slurm to version 19.05.7-1

This list is subject to change especially on the maintenance day.

Cluster status

  • Login nodes: 🟢
  • CPU nodes: 🟢
  • GPU nodes: 🟢
  • MEM nodes: 🟢
  • Storage: 🟢

We recommend that you check this page regularily for updates on the status of the cluster before, during and after the maintenance day.