Actions

Difference between revisions of "News"

From ALICE Documentation

(Older News)
Line 4: Line 4:
 
{{:Latest News}}
 
{{:Latest News}}
 
=== Older News ===
 
=== Older News ===
 +
*'''19 Oct. 2020 - TensorFlow update:''' We have installed a new version of ''TensorFlow/2.2.0-fosscuda-2019b-Python-3.7.4''. The module is not yet set as the default, so you have to load it explicitely. As soon as we make it the default, it will be announced here.
 +
 
{{:Maintenance announcements}}
 
{{:Maintenance announcements}}
 
{{:Results}}
 
{{:Results}}

Revision as of 10:28, 2 November 2020


News

Latest News

  • 25 Nov. 2021 - Testing of news as login messages: In order to better communicate with you changes, news and announcements regarding ALICE, we start testing to include them in an abbreviated format as login messages when you login to one of the login nodes. The format for login messages only allows shortened versions, so the entire news item will continue to be available only on the ALICE wiki.
  • 19 Nov. 2021 - Important update to job limits (QOS): Following a review of previous changes, we made additional adjustments of Slurm's QOS settings that handles limits on the amount of resources your job can request for each partition:
    • There is no limit any more on number of jobs that you submit except for the testing partition.
    • We have introduced limits for the amount of CPUs and nodes that can be allocated. Please check the page on Partitions for a details
  • 17 Nov 2021 - Leiden University network maintenance on 20/21 November: Maintenance on the network of Leiden University will take place on the weekend of 20/21 November. During this time ALICE will continue to run, but in total isolation, i.e., with no internet access. This means that you will not be able to login to ALICE and jobs cannot for example pull code, download data or access license servers. We will try to track the status of ALICE here (Next maintenance) during the maintenance, but University websites such as this wiki might not be reachable.
  • 16 Nov. 2021 - Important update to partition and qos. We are working on a general update of the partition system of ALICE to improve the throughput of short and medium-type jobs. However, this update will require a bit more time for evaluation and testing. As an intermediate step, we have made the following changes. If you have any feedback or comment, please contact the ALICE helpdesk.
    • CPU nodes: node001 and node002 have been taken out of the cpu-long partition and node001 has been taken out of the cpu-medium partition. As a result, node001 is now exclusively available for short jobs and node002 for short and medium jobs.
    • GPU nodes: Node851 has been taken out of the gpu-long partition. As a result, is it is exclusively available to the short and medium partition.
    • The time limit of the short partitions has been raised to 4h.
    • Each login node has one NVIDIA Tesla T4 which you can now use as part of the testing partition.
    • The number of jobs that users can submit has been increased on all partitions. Please check the page on Partitions for a details. (See news from 19 Nov 2021)
  • 16 Nov. 2021 - New e-mail notification. The content of the e-mail that is automatically send out by slurm has been updated. The notification can now handle array jobs and it contains more detailed information on the performance and resources used by your job.
  • 8 Oct. 2021 - Infiniband network back in operation. The broken Infiniband switch has been replaced and the Infiniband network is working again. You can make use of the Infiniband network again for your jobs on the CPU partitions.
  • 8 Oct. 2021 - Node020 and node859 used for testing Node020 and node859 will be reserved from time to time to continue testing the new BeeGFS storage system.
  • 30 Aug. 2021 - Node020 reserved to testing We have been working on the configuration of the new BeeGFS storage system. To this purpose, we have reserved node020 for running tests.

Older News

  • 19 Oct. 2020 - TensorFlow update: We have installed a new version of TensorFlow/2.2.0-fosscuda-2019b-Python-3.7.4. The module is not yet set as the default, so you have to load it explicitely. As soon as we make it the default, it will be announced here.

Maintenance

This section is used to announce upcoming maintenance and provide information before, during and after it. For general information about our maintenance policy, please have a look here: To maintenance policy

Next Maintenance

Leiden University network maintenance on 20/21 November

Maintenance on the network of Leiden University will take place on the weekend of 20/21 November. The official announcement from the University can be found on the University webpage.

During this time ALICE will continue to run, but in total isolation, i.e., with no internet access. This means that you will not be able to login to ALICE and jobs cannot for example pull code, download data or access license servers.

We will use this page to provide updates on the status of the cluster.

If you have any question, please contact the ALICE Helpdesk.

Previous Maintenance days

ALICE node status

CPU nodes: Node015 is out-of-service
Gateway: UP
Head node: UP
Login nodes: UP
GPU nodes: UP
CPU nodes: Up (except for node015)
High memory nodes: UP
Storage: UP
Network: UP

Current Issues

  • Node015 out of service:
    • Node015 is out of service because of technical issues. We are in contact with our vendor.
    • Status: Work in Progress
    • Last Updated: 30 Nove 2021, 14:58 CET
  • Copying data to the shared scratch via sftp:
    • There is currently an issue on the sftp gateway which does prevents users from copying data to their shared scratch directory, i.e., /home/<username>/data
    • A current work-around is to use scp or sftp via the ssh gateway and the login nodes.
    • Status: Work in Progress
    • Last Updated: 30 Nov 2021, 14:56 CET


See here for other recently solved issues: Solved Issues

Publications

Articles with acknowledgements to the use of ALICE

Astronomy and Astrophysics

Computer Sciences

  • Better Distractions: Transformer-based Distractor Generationand Multiple Choice Question Filtering, Offerijns, J., Verberne, V., Verhoe, T., eprint arXiv:2010.09598, (October 2020), https://arxiv.org/abs/2010.09598

Leiden researchers and their use of HPC

News articles featuring ALICE

  • Hazardous Object Identifier: Supercomputer Helps to Identify Dangerous Asteroids, Oliver Peckman, HPC Wire, 04 March 2020, link
  • Elf reuzestenen op ramkoers met de aarde?, Annelies Bes, 13 February 2020, Kijk Magazine, link
  • Leidse sterrenkundigen ontdekken aardscheerders-in-spé, NOVA, 12 February 2020, link