Site structure
From ALICE Documentation
This wiki is the main documentation source for information about the Academic Leiden Interdisciplinary Cluster Environment (ALICE) cluster. ALICE is the research computing facility of the partnership Leiden University and Leiden University Medical Center. It is available to any researcher from both partners. In this wiki, we will introduce the ALICE cluster in detail.
Below you find a collapsible tree structure of our documentation set which gives you a quick overview of the full navigation tree. The root tree items are also directory accessible from the top navigation bar.
Contents
ALICE (Academic Leiden Interdisciplinary Cluster Environment) is the high-performance computing (HPC) facility of the partnership between Leiden University and Leiden University Medical Center (LUMC). It is available to any researcher from both partners. Leiden University and LUMC aim to help deliver cutting edge research using innovative technology within the broad area of data-centric HPC. Both partners are responsible for the hosting, system support, scientific support and service delivery of several large super-computing and research data storage resources for the Leiden research community.
This wiki is the main source of documentation about the ALICE cluster.
Research Acknowledgement
We request that you acknowledge the use of the ALICE compute facilities in all publications and presentations which use any results generated through your use of ALICE. The following acknowledgement can be used:
"This work was performed using the compute resources from the Academic Leiden Interdisciplinary Cluster Environment (ALICE) provided by Leiden University."
or in shorter version:
"This work was performed using the ALICE compute resources provided by Leiden University."
We request also that you send copies of your publications that acknowledge the use of ALICE resources (or provide the links) to helpdesk@alice.leidenuniv.nl (mail request).
Why ALICE
High-Performance Computing (HPC) previously the domain of theoretical scientists and computer and software developers is becoming ever more important as a research tool in many research areas. An HPC facility, providing serious computational capabilities, combined with easy and flexible local access, is a strong advantage for these research areas. ALICE is the HPC facility that answers those needs for Leiden University (LU) and Leiden University Medical Center (LUMC). It is available to all researchers and students from both LU and LUMC.
The ALICE facility currently implemented is a first phase edition of what will be a larger Hybrid HPC facility for research, exceeding the capabilities of what individual institutes can build and will provide a stepping stone to the larger national facilities. Although the current implementation is located at two data centres (LU and LUMC) it is one.
The facility aims to be an easily accessible, easily usable system with extensive local support at all levels of expertise. Given the expected diverse use, diversity is implemented in all aspects of computing, namely: the number of CPU's, GPU's and the ratio of these two numbers; the size of the core memory to the CPU's; the data storage size and location; and the speed of the network.
ALICE provides not only a sophisticated production machine but is also a tool for educating all aspects of HPC and a learning machine for young researchers to prepare themselves for national and international HPC.
Overview of the cluster
The ALICE cluster is a hybrid cluster consisting of
- 2 login nodes (4 TFlops)
- 20 CPU nodes (40 TFlops)
- 10 GPU nodes (40 GPU, 20 TFlops CPU + 536 TFlops GPU)
- 1 High Memory CPU node (4 TFlops)
- Storage Device (31 * 15 + 70 = 535 TB)
In summary: 604 TFlops, 816 cores (1632 threads), 14.4 TB RAM.
ALICE has a second high memory. This node is not included above as it is only available to the research group which purchased the node.
You can find a more comprehensive description of the individual components of ALICE in the section Hardware Description. Also see a photo gallery of the hardware.
ALICE is a pre-configuration system for the university to gain experience with managing, supporting and operating a university-wide HPC system. Once the system and governance have proven to be a functional research asset, it will be extended and continued for the coming years.
The descriptions are for the configuration which is housed partly in the data centre at LMUY and the data centre at Leiden University Medical Center (LUMC).
Future plans
ALICE will be expanded over the coming years. Apart from our own expansion plans, we are always open to collaborate with other groups/institutes on expanding and improving ALICE.
The expansion that are currently being discussed include:
- Expansion of temporary storage (first half 2021)
- We are in the process of getting a 250TB parellel storage system that will run BeeGFS.
- Upgrading login nodes with GPUs (Q2 2021)
- We are planning to add one NVIDIA Tesla T4 to each login node.
- Additional GPU nodes (estimated second half of 2021)
- Additional CPU nodes (estimated Q3 2022)
...
How to get involved with ALICE
There are several levels at which you can get involved. They are fully detailed in Getting involved, in-depth. For now, you probably want to start getting to know the system and take your first (baby) steps toward High Performance Computing. This wiki has several pages explaining details of the steps toward using ALICE. These steps are:
- User Guides Getting an account and Connecting to ALICE
- Understanding how to work on an HPC cluster see HPC Basic concepts
- Prepare for your first job
- Learning more advanced stuff
Costs overview
Currently, access to the ALICE cluster and related services is provided free of charge to all researchers and students at LU and LUMC.
Current Status Overview
This user guide will help you get started if you are new to ALICE and working on an HPC in general.
- General documentation
- About ALICE
- System description
- HPC Software
- Policies
- More in-depth
- Security and privacy
- More on login nodes
- Using Node802
- Running jobs
- Other
- Software
- Available software
- Develop your own code
- Running in parallel
- Software packages
News
Latest News
- 01 Aug 2022 - ALICE system maintenance on 22 Aug 2022 - First announcement: We will perform system maintenance on ALICE on 22 Aug 2022 between 09:00 and 18:00 CEST. Our primary focus will be the high-availability set up of ALICE in addition to other maintenance tasks. This will require us to take all compute and login nodes of the cluster offline. It will not be possible to run any jobs and access data on ALICE. Until maintenance starts, you can continue to use ALICE as usual and submit jobs. Slurm will also continue to run your job if the requested running time will allow it to finish before the maintenance starts. If you have any questions, please contact the ALICE Helpdesk.
- 01 Jun 2022 - Disabled access to old scratch storage: As previously announced, we have disabled access to the old scratch storage. We will keep the data available until 30 June 2022. Afterwards, we will start to delete data so that we can repurpose the storage within ALICE. You can request temporary access by contacting the ALICE Helpdesk. See also the wiki page: Data Storage.
Older News
For older news, please have a look at the news archive: News Archive
Events
Here you can find information about upcoming events related to ALICE.
Maintenance
This section is used to announce upcoming maintenance and provide information before, during and after it. For general information about our maintenance policy, please have a look here: To maintenance policy
Next Maintenance
System Maintenance on ALICE will take place on 22 Aug 2022 between 09:00 and 18:00 CEST (See the Maintenance Announcement)
We will perform system maintenance on the ALICE HPC cluster on Monday 22 August 2022 between 09:00 and 18:00.
On this day, it will not be possible to run any jobs and access data on ALICE. Until maintenance starts, you can continue to use ALICE as usual and submit jobs. Slurm will also continue to run your job if the requested running time will allow it to finish before the maintenance starts.
Our primary focus will be the high-availability set up of ALICE in addition to other maintenance tasks.
We understand that this represents an inconvenience for you. If you have any questions, please contact the ALICE Helpdesk.
Previous Maintenance days
ALICE node status
Gateway: UP Head node: UP Login nodes: UP GPU nodes: UP CPU nodes: Up High memory nodes: UP Storage: UP Network: UP
Current Issues
- No access to ALICE - SSH gateway failure:
The ssh gateway is currently not working.Access to ALICE is not possible. The cluster itself is not affected and processing continues.- The gateway is working again. Access is possible
- Status: SOLVED
- Last Updated: 02 Jun 2022, 19:45 CET
- Copying data to the shared scratch via sftp:
- There is currently an issue on the sftp gateway which does prevents users from copying data to their shared scratch directory, i.e.,
/home/<username>/data
- A current work-around is to use scp or sftp via the ssh gateway and the login nodes.
- Status: Work in Progress
- Last Updated: 30 Nov 2021, 14:56 CET
- There is currently an issue on the sftp gateway which does prevents users from copying data to their shared scratch directory, i.e.,
See here for other recently solved issues: Solved Issues
Publications
Articles with acknowledgements to the use of ALICE
Astronomy and Astrophysics
- High-level ab initio quartic force fields and spectroscopic characterization of C2N−, Rocha, C. M. R. and Linnartz, H, Phys. Chem. Chem. Phys, November 2021, DOI: https://doi.org/10.1039/D1CP03505C
- Effects of stellar density on the photoevaporation of circumstellar discs, Concha-Ramirez, F. et al., MNRAS, 501, 1782 (February 2021), DOI: https://doi.org/10.1093/mnras/staa3669
- Lucky planets: how circum-binary planets survive the supernova in one of the inner-binary components, Fagginger Auer, F. & Portegies Zwart, S., eprint arXiv:2101.08033, Submitted to SciPost Astronomy (January 2021), https://ui.adsabs.harvard.edu/link_gateway/2021arXiv210108033F/arxiv:2101.08033
- Trimodal structure of Hercules stream explained by originating from bar resonances, Asano, T. et al., MNRAS, 499, 2416 (December 2020), DOI: https://doi.org/10.1093/mnras/staa2849
- Oort cloud Ecology II: Extra-solar Oort clouds and the origin of asteroidal interlopers, Portegies Zwart, S., eprint arXiv:2011.08257, accepted for publication by A&A, (November 2020), https://ui.adsabs.harvard.edu/link_gateway/2020arXiv201108257P/arxiv:2011.08257
- The ecological impact of high-performance computing in astrophysics. Portegies Zwart, S., Nature Astronomy, 4, 819–822 (September 2020), DOI: https://doi.org/10.1038/s41550-020-1208-y.
Computer Sciences
- Better Distractions: Transformer-based Distractor Generationand Multiple Choice Question Filtering, Offerijns, J., Verberne, V., Verhoe, T., eprint arXiv:2010.09598, (October 2020), https://arxiv.org/abs/2010.09598
Ecology
- Improving estimations of life history parameters of small animals in mesocosm experiments: A case study on mosquitoes, Dellar, M., Sam, B.P., Holmes, D., Methods in Ecology and Evolution, (February 2022), https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.13814
Leiden researchers and their use of HPC
- Identifying Earth-impacting asteroids using an artificial neural network. John D. Hefele, Francesco Bortolussi and Simon Portegies Zwart. Astronomy & Astrophysics, February 2020.
News articles featuring ALICE
- Hazardous Object Identifier: Supercomputer Helps to Identify Dangerous Asteroids, Oliver Peckman, HPC Wire, 04 March 2020, link
- Elf reuzestenen op ramkoers met de aarde?, Annelies Bes, 13 February 2020, Kijk Magazine, link
- Leidse sterrenkundigen ontdekken aardscheerders-in-spé, NOVA, 12 February 2020, link