Actions

Difference between revisions of "Alice Wiki Pages"

From ALICE Documentation

Line 1: Line 1:
<div>
 
==== ALICE usage statistics past 4 hours ====
 
<ul>
 
<li style="display: inline-block;"> <img src=http://alice.strw.leidenuniv.nl/ganglia/stacked.php?m=load_one&c=ALICE&r=4hr&st= width=450px><br>Cluster Load</li>
 
<li style="display: inline-block;"> <img src=http://alice.strw.leidenuniv.nl/ganglia/stacked.php?m=proc_run&c=ALICE&r=4hr&st= width=450px><br>Number of running processes</li>
 
</ul></div>
 
<div><ul>
 
<li style="display: inline-block;"><img src=http://alice.strw.leidenuniv.nl/ganglia/graph.php?c=ALICE&m=queue-state&r=4hr&s=by+name&st=&g=load_report width=450></li>
 
<li style="display: inline-block;"><img src=http://alice.strw.leidenuniv.nl/ganglia/graph.php?c=ALICE&m=queue-state&r=4hr&s=by%20name&st=&g=mem_report width=450px></li>
 
</ul></div>
 
<div><ul>
 
<li style="display: inline-block;"><img src=http://alice.strw.leidenuniv.nl/ganglia//graph.php?c=ALICE&m=load_one&r=4hr&s=by%20name&hc=4&mc=2&g=cpu_report width=450></li>
 
<li style="display: inline-block;"><img src=http://alice.strw.leidenuniv.nl/ganglia/graph.php?c=ALICE&m=load_one&r=4hr&s=by%20name&hc=4&mc=2&g=network_report width=450px></li>
 
</ul></div>
 
<div><ul>
 
<li style="display: inline-block;"><img src=http://alice.strw.leidenuniv.nl/stats/stats.php?Type=data width=450></li>
 
<li style="display: inline-block;"><img src=http://alice.strw.leidenuniv.nl/stats/stats.php?Type=home width=450px></li>
 
</ul></div>
 
[http://alice.strw.leidenuniv.nl/stats/userstat.php Overview of ALICE resources use by user/month]
 
 
==Collapsible Full Navigation Tree==
 
==Collapsible Full Navigation Tree==
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px; overflow:auto;">
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px; overflow:auto;">

Revision as of 08:12, 29 June 2020

Collapsible Full Navigation Tree

About ALICE
Off to research computing Wonderland

ALICE (Academic Leiden Interdisciplinary Cluster Environment) is the high-performance computing (HPC) facility of the partnership between Leiden University and Leiden University Medical Center (LUMC). It is available to any researcher from both partners. Leiden University and LUMC aim to help deliver cutting edge research using innovative technology within the broad area of data-centric HPC. Both partners are responsible for the hosting, system support, scientific support and service delivery of several large super-computing and research data storage resources for the Leiden research community.

This wiki is the main source of documentation about the ALICE cluster.

Research Acknowledgement

We request that you acknowledge the use of the ALICE compute facilities in all publications and presentations which use any results generated through your use of ALICE. The following acknowledgement can be used:

"This work was performed using the compute resources from the Academic Leiden Interdisciplinary Cluster Environment (ALICE) provided by Leiden University."

or in shorter version:

"This work was performed using the ALICE compute resources provided by Leiden University."

We request also that you send copies of your publications that acknowledge the use of ALICE resources (or provide the links) to helpdesk@alice.leidenuniv.nl (mail request).

Why ALICE

High-Performance Computing (HPC) previously the domain of theoretical scientists and computer and software developers is becoming ever more important as a research tool in many research areas. An HPC facility, providing serious computational capabilities, combined with easy and flexible local access, is a strong advantage for these research areas. ALICE is the HPC facility that answers those needs for Leiden University (LU) and Leiden University Medical Center (LUMC). It is available to all researchers and students from both LU and LUMC.

The ALICE facility currently implemented is a first phase edition of what will be a larger Hybrid HPC facility for research, exceeding the capabilities of what individual institutes can build and will provide a stepping stone to the larger national facilities. Although the current implementation is located at two data centres (LU and LUMC) it is one.

The facility aims to be an easily accessible, easily usable system with extensive local support at all levels of expertise. Given the expected diverse use, diversity is implemented in all aspects of computing, namely: the number of CPU's, GPU's and the ratio of these two numbers; the size of the core memory to the CPU's; the data storage size and location; and the speed of the network.

ALICE provides not only a sophisticated production machine but is also a tool for educating all aspects of HPC and a learning machine for young researchers to prepare themselves for national and international HPC.

Overview of the cluster

Conceptual View of ALICE

The ALICE cluster is a hybrid cluster consisting of

  • 2 login nodes (4 TFlops)
  • 20 CPU nodes (40 TFlops)
  • 10 GPU nodes (40 GPU, 20 TFlops CPU + 536 TFlops GPU)
  • 1 High Memory CPU node (4 TFlops)
  • Storage Device (31 * 15 + 70 = 535 TB)

In summary: 604 TFlops, 816 cores (1632 threads), 14.4 TB RAM.

ALICE has a second high memory. This node is not included above as it is only available to the research group which purchased the node.

You can find a more comprehensive description of the individual components of ALICE in the section Hardware Description. Also see a photo gallery of the hardware.

ALICE is a pre-configuration system for the university to gain experience with managing, supporting and operating a university-wide HPC system. Once the system and governance have proven to be a functional research asset, it will be extended and continued for the coming years.

The descriptions are for the configuration which is housed partly in the data centre at LMUY and the data centre at Leiden University Medical Center (LUMC).

Future plans

ALICE will be expanded over the coming years. Apart from our own expansion plans, we are always open to collaborate with other groups/institutes on expanding and improving ALICE.

The expansion that are currently being discussed include:

  • Expansion of temporary storage (first half 2021)
    • We are in the process of getting a 250TB parellel storage system that will run BeeGFS.
  • Upgrading login nodes with GPUs (Q2 2021)
    • We are planning to add one NVIDIA Tesla T4 to each login node.
  • Additional GPU nodes (estimated second half of 2021)
  • Additional CPU nodes (estimated Q3 2022)

...

How to get involved with ALICE

There are several levels at which you can get involved. They are fully detailed in Getting involved, in-depth. For now, you probably want to start getting to know the system and take your first (baby) steps toward High Performance Computing. This wiki has several pages explaining details of the steps toward using ALICE. These steps are:

Costs overview

Currently, access to the ALICE cluster and related services is provided free of charge to all researchers and students at LU and LUMC.


Current Status Overview

News

News

Latest News

  • 23 Jul. 2021 - Leiden University network maintenance on 31 Jul/01 Aug: Maintenance on the network of Leiden University will take place on the weekend of 31 July/01 August. During this time ALICE will continue to run, but in total isolation, i.e., with no internet access. This means that you will not be able to login to ALICE and jobs cannot for example pull code, download data or access license servers. During the maintenance, the status will be tracked here Next maintenance
  • 29 Jun. 2021 - ALICE system maintenance finished (Update): System maintenance has finished and ALICE is available again.
    • However, two one issue remains.
    • Login node 1 is down due to technical issues on the node. Login2 is running and can be used instead. Connections that are intended to login1 are automatically routed to login2. There should be no need to change your ssh configs.
    • The Infiniband network is down due to technical issues on the Infiniband switch.
    • List of changes:
      • Login node 1 is running and the NVIDIA Tesla T4 has been integrated successfully. Instructions on using the T4 will follow soon.
      • Slurm version 20.11.7 is now running on ALICE
      • EasyBuild 4.4.0 is used for the Intel and AMD branch
      • The partitions notebook-gpu, notebook-cpu, playground-cpu, playground-gpu have been removed.
      • The time limit on the mem partition has been changed from Infinite to 14 days.
      • Resources on the testing partitions are now limited to 15 CPUs per node, a maximum amount of memory per node of 150G, a default memory per cpu of 10G.
  • 28 Jun. 2021 - ALICE system maintenance continues tomorrow: During our maintenance, we encountered a few issues with the Infiniband switch and login node 01. Because of the issues, we also did not finish updating the GPU nodes. We will continue working on these item tomorrow (Tuesday, 29 June 2021) until at least 12:00. ALICE will remain offline for maintenance.
  • 27 Jun. 2021 - ALICE offline for system maintenance: More information here Next maintenance.
  • 25 Jun. 2021 - System maintenance on ALICE: ALICE will undergo system maintenance on 28 June 2021. More information here Next maintenance.
  • 2 Jun. 2021 - Rclone available on ALICE: Rclone is available on ALICE and there are instructions on how to set it up to transfer files to and from SurfDrive and ResearchDrive: Data transfer to and from ALICE. This is a new feature and feedback on your experience is very welcome.

Older News

For older news, please have a look at the news archive: News Archive

Maintenance

This section is used to announce upcoming maintenance and provide information before, during and after it. For general information about our maintenance policy, please have a look here: To maintenance policy

Next Maintenance

Leiden University network maintenance on 31 Jul/01 Aug

Maintenance on the network of Leiden University will take place on the weekend of 31 July/01 August.

During this time ALICE will continue to run, but in total isolation, i.e., with no internet access. This means that you will not be able to login to ALICE and jobs cannot for example pull code, download data or access license servers.

We will use this page to provide updates on the status of the cluster.

If you have any question, please contact the ALICE Helpdesk.

Previous Maintenance days

Publications

Articles with acknowledgements to the use of ALICE

Astronomy

Computer Sciences

  • Better Distractions: Transformer-based Distractor Generationand Multiple Choice Question Filtering, Offerijns, J., Verberne, V., Verhoe, T., eprint arXiv:2010.09598, (October 2020), https://arxiv.org/abs/2010.09598

Leiden researchers and their use of HPC

News articles featuring ALICE

  • Hazardous Object Identifier: Supercomputer Helps to Identify Dangerous Asteroids, Oliver Peckman, HPC Wire, 04 March 2020, link
  • Elf reuzestenen op ramkoers met de aarde?, Annelies Bes, 13 February 2020, Kijk Magazine, link
  • Leidse sterrenkundigen ontdekken aardscheerders-in-spé, NOVA, 12 February 2020, link