Actions

Difference between revisions of "Latest News"

From ALICE Documentation

(Latest News)
(Latest News)
 
(33 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
=== Latest News ===
 
=== Latest News ===
*'''17 Nov 2021 - Leiden University network maintenance on 20/21 November:''' Maintenance on the network of Leiden University will take place on the weekend of 20/21 November. During this time ALICE will continue to run, but in total isolation, i.e., with no internet access. This means that you will not be able to login to ALICE and jobs cannot for example pull code, download data or access license servers. During the maintenance, the status will be tracked here [[News#Next_Maintenance|Next maintenance]]
+
*'''21 Sep 2022 - Access to ALICE:''' On 26 Sept 2022 between 18:00 and 18:30, access to ALICE will not be possible due to maintenance on the University cloud platform.
*'''16 Nov. 2021 - Important update to partition and qos'''. We are working on a general update of the partition system of ALICE to improve the throughput of short and medium-type jobs. However, this update will require a bit more time for evaluation and testing. As an intermediate step, we have made the following changes. If you have any feedback or comment, please contact the ALICE helpdesk.
+
*'''24 Aug 2022 - ALICE available again:''' Maintenance on ALICE is over. The cluster is online again and available to all users. We apologize for the delay.  
** CPU nodes: node001 and node002 have been taken out of the cpu-long partition and node001 has been taken out of the cpu-medium partition. As a result, node001 is now exclusively available for short jobs and node002 for short and medium jobs.
+
*'''23 Aug 2022 - ALICE system maintenance not finished and continues tomorrow:''' We managed to solve many of the issues that we faced yesterday. We are waiting for the completion of synchronization processes which are part of the high-availability setup procedure. If all goes well, we just need to run a few tests to verify that the new high-availability setup is working properly and all the nodes are coming back. Unfortunately, it was not possible to do today anymore. In case the setup fails after all, we are prepared to revert back all the changes and bring ALICE online again. In any case, we expect ALICE to be online again sometime tomorrow afternoon. We are sorry for the delay, but the new high-availability setup is vital for ALICE which is why have been working hard to get it done.
** GPU nodes: Node851 has been taken out of the gpu-long partition. As a result, is it is exclusively available to the short and medium partition.
+
*'''22 Aug 2022 - ALICE is offline due to system maintenance - Continues tomorrow:''' We encountered unexpected technical issues during our highest priority task for this maintenance day, the high-availability setup. Because this is a critical component for the continuing stability of ALICE and we require the cluster to be offline, we decided to continue solving the issues tomorrow and keep the cluster offline.
** The time limit of the short partitions has been raised to 4h.
+
*'''17 Aug 2022 - REMINDER - ALICE system maintenance on 22 Aug 2022:''' We will perform system maintenance on ALICE on 22 Aug 2022 between 09:00 and 18:00 CEST. Our primary focus will be the high-availability set up of ALICE in addition to other maintenance tasks. This will require us to take all compute and login nodes of the cluster offline. It will not be possible to run any jobs and access data on ALICE. The login nodes will be rebooted and all active terminal or X2Go sessions will be terminated. Until maintenance starts, you can continue to use ALICE as usual and submit jobs. Slurm will also continue to run your job if the requested running time will allow it to finish before the maintenance starts. If you have any questions, please contact the ALICE Helpdesk.
** Each login node has one NVIDIA Tesla T4 which you can now use as part of the testing partition.
+
*'''01 Aug 2022 - ALICE system maintenance on 22 Aug 2022 - First announcement:''' We will perform system maintenance on ALICE on 22 Aug 2022 between 09:00 and 18:00 CEST. Our primary focus will be the high-availability set up of ALICE in addition to other maintenance tasks. This will require us to take all compute and login nodes of the cluster offline. It will not be possible to run any jobs and access data on ALICE. Until maintenance starts, you can continue to use ALICE as usual and submit jobs. Slurm will also continue to run your job if the requested running time will allow it to finish before the maintenance starts. If you have any questions, please contact the ALICE Helpdesk.
** The number of jobs that users can submit has been increased on all partitions. Please check the page on [[:Running_jobs_on_ALICE#Partition|Partitions]] for a details.
+
*'''01 Jun 2022 - Disabled access to old scratch storage:''' As previously announced, we have disabled access to the old scratch storage. '''We will keep the data available until 30 June 2022'''. Afterwards, we will start to delete data so that we can repurpose the storage within ALICE. You can request temporary access by contacting the ALICE Helpdesk. See also the wiki page: [[Data storage|Data Storage]].
*'''16 Nov. 2021 - New e-mail notification'''. The content of the e-mail that is automatically send out by slurm has been updated. The notification can now handle array jobs and it contains more detailed information on the performance and resources used by your job.
 
*'''8 Oct. 2021 - Infiniband network back in operation'''. The broken Infiniband switch has been replaced and the Infiniband network is working again. You can make use of the Infiniband network again for your jobs on the CPU partitions.
 
*'''8 Oct. 2021 - Node020 and node859 used for testing''' Node020 and node859 will be reserved from time to time to continue testing the new BeeGFS storage system.
 
*'''30 Aug. 2021 - Node020 reserved to testing''' We have been working on the configuration of the new BeeGFS storage system. To this purpose, we have reserved node020 for running tests.
 

Latest revision as of 10:02, 21 September 2022

Latest News

  • 21 Sep 2022 - Access to ALICE: On 26 Sept 2022 between 18:00 and 18:30, access to ALICE will not be possible due to maintenance on the University cloud platform.
  • 24 Aug 2022 - ALICE available again: Maintenance on ALICE is over. The cluster is online again and available to all users. We apologize for the delay.
  • 23 Aug 2022 - ALICE system maintenance not finished and continues tomorrow: We managed to solve many of the issues that we faced yesterday. We are waiting for the completion of synchronization processes which are part of the high-availability setup procedure. If all goes well, we just need to run a few tests to verify that the new high-availability setup is working properly and all the nodes are coming back. Unfortunately, it was not possible to do today anymore. In case the setup fails after all, we are prepared to revert back all the changes and bring ALICE online again. In any case, we expect ALICE to be online again sometime tomorrow afternoon. We are sorry for the delay, but the new high-availability setup is vital for ALICE which is why have been working hard to get it done.
  • 22 Aug 2022 - ALICE is offline due to system maintenance - Continues tomorrow: We encountered unexpected technical issues during our highest priority task for this maintenance day, the high-availability setup. Because this is a critical component for the continuing stability of ALICE and we require the cluster to be offline, we decided to continue solving the issues tomorrow and keep the cluster offline.
  • 17 Aug 2022 - REMINDER - ALICE system maintenance on 22 Aug 2022: We will perform system maintenance on ALICE on 22 Aug 2022 between 09:00 and 18:00 CEST. Our primary focus will be the high-availability set up of ALICE in addition to other maintenance tasks. This will require us to take all compute and login nodes of the cluster offline. It will not be possible to run any jobs and access data on ALICE. The login nodes will be rebooted and all active terminal or X2Go sessions will be terminated. Until maintenance starts, you can continue to use ALICE as usual and submit jobs. Slurm will also continue to run your job if the requested running time will allow it to finish before the maintenance starts. If you have any questions, please contact the ALICE Helpdesk.
  • 01 Aug 2022 - ALICE system maintenance on 22 Aug 2022 - First announcement: We will perform system maintenance on ALICE on 22 Aug 2022 between 09:00 and 18:00 CEST. Our primary focus will be the high-availability set up of ALICE in addition to other maintenance tasks. This will require us to take all compute and login nodes of the cluster offline. It will not be possible to run any jobs and access data on ALICE. Until maintenance starts, you can continue to use ALICE as usual and submit jobs. Slurm will also continue to run your job if the requested running time will allow it to finish before the maintenance starts. If you have any questions, please contact the ALICE Helpdesk.
  • 01 Jun 2022 - Disabled access to old scratch storage: As previously announced, we have disabled access to the old scratch storage. We will keep the data available until 30 June 2022. Afterwards, we will start to delete data so that we can repurpose the storage within ALICE. You can request temporary access by contacting the ALICE Helpdesk. See also the wiki page: Data Storage.