Actions

Difference between revisions of "ALICE User Documentation Wiki"

From ALICE Documentation

Line 2: Line 2:
 
__NOTOC__
 
__NOTOC__
  
ALICE is the computing facility for excellent research of Leiden University. With ALICE you have the world of computing at your fingertips. On this wiki, you can find the information you'll need to get started and become more skilled in using computing to support your research.
+
'''Welcome to the ALICE HPC user documentation.'''
  
For background information, read the [[About ALICE]] page.  
+
ALICE is a computing facility for state-of-the-art research and education of Leiden University. With ALICE you have the world of computing at your fingertips. On this wiki, you can find the information you'll need to get started and become more skilled in using computing to support your research and education.  
  
 +
We appreciate any questions or comments on the content of the documentation so that we can improve the range of information that we supply here.
  
Please know that this wiki is currently a work in process. We appreciate any questions or comments on the contents so that we can improve the range of information that we supply here.   
+
If you are unsure about where to go next, have a look below.
  
 +
==What is ALICE?==
 +
Please check out the [[About ALICE]] pages to get some background information, a quick overview and see how to acknowledge it.
  
''IMPORTANT NOTE: ALICE is still in a build-up phase. Configurations are still subject to change. You may, therefore, experience unexpected behaviour for the time being.''
+
==What's new with ALICE?==
 +
To get information about updates, upgrades, events, planned maintenance and more, have a look at the [[News]] page.  
  
 +
Here are the most recent news:
  
Use of the ALICE cluster must be acknowledged in any and all publications. For more info see: [[About ALICE]]
+
{{:Latest News}}
 
 
  
 
----
 
----
{{:Maintenance announcements}}
+
{{:Next Maintenance}}
 
----
 
----
  
==Getting Started==
+
==Just Getting Started?==
If you're new to ALICE, please check out the [[Getting Started]] page.
+
If you're new to ALICE, please check out the [[User_Guides|User Guide]].
 
 
==[[Gaining access]]==
 
Access to the cluster and file transfer are done via SSH and SCP/SFTP. Select one of the below links for more detail or click on the heading of the paragraph for a full overview.
 
 
 
*[[Gaining_access#Account|Getting an account]]
 
*[[Gaining_access#Login|Login to ALICE]]
 
*[[Gaining_access#Key_based_authentication|Public key authentication]]
 
*[[Gaining_access#File_systems|Accessible file storage]]
 
*[[Gaining_access#File_transfer_to_and_from_ALICE|File transfer]]
 
 
 
==[[Policies#Access policy|Access Policy]]==
 
Access needs to be granted actively (by the creation of an account on the cluster by the ALICE Cluster workgroup. Use of resources is limited by the scheduler. Depending on the availability of queues ('partitions') granted to a user, priority to the system's resources is regulated on the basis of Faculty/institute/PI levels.
 
 
 
==Software==
 
 
 
===Cluster Monitoring Software and Scheduler===
 
ALICE uses Bright Cluster Manager software for overall cluster management and Slurm as the scheduler.
 
 
 
*[[Bright Cluster Manager|Monitor cluster status with BCM]]
 
*[[Ganglia Cluster Monitoring]]
 
*[[Slurm|Compose, submit and manage jobs with Slurm]]
 
*[[Running interactive jobs]]
 
  
===Installed software===
+
==What more can I do with ALICE?==
[[Accessing software|Globally installed software, modules]]<br />
+
If you already have experience with ALICE and/or HPC, have a look at the [[Advanced Guide]] pages. Please note that many of the pages here are still under construction and subject to change.
  
==Cluster configuration==
+
==What else is there about ALICE?==
Find [[hardware description|here a hardware description]] of the ALICE cluster.
+
If you need more information on general topics, such as hardware, storage, and policies, please take a look at the [[Documentation]] pages. Please note that many of the pages here are still under construction and subject to change.
  
 +
==Have a question or feedback on ALICE?==
 +
If you have a question about ALICE, need help with using it or want to give us some feedback, please see the [[Support]] page to know how you can connect with us.
  
==Frequently Asked Questions==
+
==Status of ALICE?==
Find [[faq|here a list of frequently asked questions]] for the ALICE cluster.
+
Would you like to know how busy ALICE is and if all nodes are up, then please have a look at the [[Current Status Overview]].

Revision as of 07:50, 2 November 2020

Off to research computing Wonderland


Welcome to the ALICE HPC user documentation.

ALICE is a computing facility for state-of-the-art research and education of Leiden University. With ALICE you have the world of computing at your fingertips. On this wiki, you can find the information you'll need to get started and become more skilled in using computing to support your research and education.

We appreciate any questions or comments on the content of the documentation so that we can improve the range of information that we supply here.

If you are unsure about where to go next, have a look below.

What is ALICE?

Please check out the About ALICE pages to get some background information, a quick overview and see how to acknowledge it.

What's new with ALICE?

To get information about updates, upgrades, events, planned maintenance and more, have a look at the News page.

Here are the most recent news:

Latest News

  • 25 Nov. 2021 - Testing of news as login messages: In order to better communicate with you changes, news and announcements regarding ALICE, we start testing to include them in an abbreviated format as login messages when you login to one of the login nodes. The format for login messages only allows shortened versions, so the entire news item will continue to be available only on the ALICE wiki.
  • 19 Nov. 2021 - Important update to job limits (QOS): Following a review of previous changes, we made additional adjustments of Slurm's QOS settings that handles limits on the amount of resources your job can request for each partition:
    • There is no limit any more on number of jobs that you submit except for the testing partition.
    • We have introduced limits for the amount of CPUs and nodes that can be allocated. Please check the page on Partitions for a details
  • 17 Nov 2021 - Leiden University network maintenance on 20/21 November: Maintenance on the network of Leiden University will take place on the weekend of 20/21 November. During this time ALICE will continue to run, but in total isolation, i.e., with no internet access. This means that you will not be able to login to ALICE and jobs cannot for example pull code, download data or access license servers. We will try to track the status of ALICE here (Next maintenance) during the maintenance, but University websites such as this wiki might not be reachable.
  • 16 Nov. 2021 - Important update to partition and qos. We are working on a general update of the partition system of ALICE to improve the throughput of short and medium-type jobs. However, this update will require a bit more time for evaluation and testing. As an intermediate step, we have made the following changes. If you have any feedback or comment, please contact the ALICE helpdesk.
    • CPU nodes: node001 and node002 have been taken out of the cpu-long partition and node001 has been taken out of the cpu-medium partition. As a result, node001 is now exclusively available for short jobs and node002 for short and medium jobs.
    • GPU nodes: Node851 has been taken out of the gpu-long partition. As a result, is it is exclusively available to the short and medium partition.
    • The time limit of the short partitions has been raised to 4h.
    • Each login node has one NVIDIA Tesla T4 which you can now use as part of the testing partition.
    • The number of jobs that users can submit has been increased on all partitions. Please check the page on Partitions for a details. (See news from 19 Nov 2021)
  • 16 Nov. 2021 - New e-mail notification. The content of the e-mail that is automatically send out by slurm has been updated. The notification can now handle array jobs and it contains more detailed information on the performance and resources used by your job.
  • 8 Oct. 2021 - Infiniband network back in operation. The broken Infiniband switch has been replaced and the Infiniband network is working again. You can make use of the Infiniband network again for your jobs on the CPU partitions.
  • 8 Oct. 2021 - Node020 and node859 used for testing Node020 and node859 will be reserved from time to time to continue testing the new BeeGFS storage system.
  • 30 Aug. 2021 - Node020 reserved to testing We have been working on the configuration of the new BeeGFS storage system. To this purpose, we have reserved node020 for running tests.

Next Maintenance

Leiden University network maintenance on 20/21 November

Maintenance on the network of Leiden University will take place on the weekend of 20/21 November. The official announcement from the University can be found on the University webpage.

During this time ALICE will continue to run, but in total isolation, i.e., with no internet access. This means that you will not be able to login to ALICE and jobs cannot for example pull code, download data or access license servers.

We will use this page to provide updates on the status of the cluster.

If you have any question, please contact the ALICE Helpdesk.


Just Getting Started?

If you're new to ALICE, please check out the User Guide.

What more can I do with ALICE?

If you already have experience with ALICE and/or HPC, have a look at the Advanced Guide pages. Please note that many of the pages here are still under construction and subject to change.

What else is there about ALICE?

If you need more information on general topics, such as hardware, storage, and policies, please take a look at the Documentation pages. Please note that many of the pages here are still under construction and subject to change.

Have a question or feedback on ALICE?

If you have a question about ALICE, need help with using it or want to give us some feedback, please see the Support page to know how you can connect with us.

Status of ALICE?

Would you like to know how busy ALICE is and if all nodes are up, then please have a look at the Current Status Overview.