Actions

Next Maintenance

From ALICE Documentation

Revision as of 09:16, 2 November 2020 by Deuler (talk | contribs) (Next Maintenance)

Next Maintenance

The next maintenance window will be on Monday, 02 November 2020.

We have the following task(s) planned:

  1. Roll-out updated NVIDIA drivers on all GPU nodes:
    Done
  2. Launch of upgraded ALICE wiki:
    Done


Important for the GPU nodes/partitions

Updating all the GPU nodes requires exclusive access to these nodes, but we hope that we will only need them for half the day. We will activate a slurm reservation for the GPU nodes that will start at 08:00 CET on Monday, 02 Nov 2020. All jobs that are still running on the GPU nodes/GPU partitions at 08:00 CET on Monday, 02 Nov. 2020 will have to be cancelled because we need to restart these nodes. Please, make sure that your jobs finish before then.

We will put in the reservation on Friday, 30 Oct. 2020 at 17:00. You can continue to submit and run jobs over the weekend, but jobs that will last longer than the starting time of our reservation will likely remain in the queue until after the reservation ends.

To be clear, jobs submitted to or running in the CPU or high-memory partitions will not be affected by the planned maintenance. You can continue to use these partitions as usual.

ALICE node status

CPU nodes: Node015 is out-of-service
Gateway: UP
Head node: UP
Login nodes: UP
GPU nodes: UP
CPU nodes: Up (except for node015)
High memory nodes: UP
Storage: UP
Network: UP

Current Issues

  • Node015 out of service:
    • Node015 is out of service because of technical issues. We are in contact with our vendor.
    • Status: Work in Progress
    • Last Updated: 30 Nove 2021, 14:58 CET
  • Copying data to the shared scratch via sftp:
    • There is currently an issue on the sftp gateway which does prevents users from copying data to their shared scratch directory, i.e., /home/<username>/data
    • A current work-around is to use scp or sftp via the ssh gateway and the login nodes.
    • Status: Work in Progress
    • Last Updated: 30 Nov 2021, 14:56 CET


See here for other recently solved issues: Solved Issues