Next Maintenance

From ALICE Documentation

Revision as of 09:16, 2 November 2020 by Deuler (talk | contribs) (Next Maintenance)

Next Maintenance

The next maintenance window will be on Monday, 02 November 2020.

We have the following task(s) planned:

  1. Roll-out updated NVIDIA drivers on all GPU nodes:
  2. Launch of upgraded ALICE wiki:

Important for the GPU nodes/partitions

Updating all the GPU nodes requires exclusive access to these nodes, but we hope that we will only need them for half the day. We will activate a slurm reservation for the GPU nodes that will start at 08:00 CET on Monday, 02 Nov 2020. All jobs that are still running on the GPU nodes/GPU partitions at 08:00 CET on Monday, 02 Nov. 2020 will have to be cancelled because we need to restart these nodes. Please, make sure that your jobs finish before then.

We will put in the reservation on Friday, 30 Oct. 2020 at 17:00. You can continue to submit and run jobs over the weekend, but jobs that will last longer than the starting time of our reservation will likely remain in the queue until after the reservation ends.

To be clear, jobs submitted to or running in the CPU or high-memory partitions will not be affected by the planned maintenance. You can continue to use these partitions as usual.

ALICE node status

Gateway: UP
Head node: UP
Login nodes: UP
GPU nodes: UP
CPU nodes: Up
High memory nodes: UP
Storage: UP
Network: UP

Current Issues

  • No access to ALICE - SSH gateway failure:
    • The ssh gateway is currently not working.
    • Access to ALICE is not possible. The cluster itself is not affected and processing continues.
    • The gateway is working again. Access is possible
    • Status: SOLVED
    • Last Updated: 02 Jun 2022, 19:45 CET
  • Copying data to the shared scratch via sftp:
    • There is currently an issue on the sftp gateway which does prevents users from copying data to their shared scratch directory, i.e., /home/<username>/data
    • A current work-around is to use scp or sftp via the ssh gateway and the login nodes.
    • Status: Work in Progress
    • Last Updated: 30 Nov 2021, 14:56 CET

See here for other recently solved issues: Solved Issues