Actions

Getting started with HPC

From ALICE Documentation

What is HPC?

“High-Performance Computing” (HPC) is computing on a “supercomputer”, a computer at the frontline of contemporary processing capacity – particularly speed of calculation and available memory.

While the supercomputers in the early days (around 1970) used only a few processors, in the 1990s machines with thousands of processors began to appear and, by the end of the 20th century, massively parallel supercomputers with tens of thousands of “off-the-shelf” processors were the norm. A large number of dedicated processors are placed in close proximity to each other in a computer cluster.

A computer cluster consists of a set of loosely or tightly connected computers that work together so that in many respects they can be viewed as a single system.

The components of a cluster are usually connected to each other through fast local area networks (“LAN”) with each node (computer used as a server) running its own instance of an operating system. Computer clusters emerged as a result of the convergence of a number of computing trends including the availability of low-cost microprocessors, high-speed networks, and software for high performance distributed computing.

Compute clusters are usually deployed to improve performance and availability over that of a single computer, while typically being more cost-effective than single computers of comparable speed or availability.

Nowadays, supercomputers play an important role in large variety of areas where computationally intensive problems have to be solved. This is not just limited to computational and natural sciences (Phyiscs, Astronomy, Chemistry and Biology), but also includes social and medical sciences, mathematics and much more.

What is ALICE?

ALICE is a collection of computers with Intel CPUs, running a Linux operating system, shaped like pizza boxes and stored above and next to each other in racks, interconnected with copper and fibre cables. Their number-crunching power is (presently) measured in tens of trillions of floating-point operations (teraflops).

ALICE relies on parallel-processing technology to offer LU and LUMC researchers an extremely fast solution for all their data processing needs.

ALICE is a shared resource system which means that it is used by multiple users at the same time. It utilizes a state-of-the-art management system to make sure that each user can get the best out of ALICE. Naturally, there are limits to ensure that all users have a fair-share of the available resources. However, a great deal of responsibility lies also with you as a user to make sure that resources are available for everyone.

Here is a summary of what ALICE currently looks like: Overview of the cluster

What ALICE is not

ALICE is not a magic computer that automatically:

  1. runs your PC-applications much faster for bigger problems;
  2. develops your applications;
  3. solves your bugs;
  4. does your thinking;
  5. . . .
  6. allows you to play games even faster.

ALICE does not replace your desktop computer.

What does a typical workflow look like?

A typical workflow looks like:

  1. Connect to the login nodes with SSH.
  2. Transfer your files to the cluster
  3. Optional: compile your code and test it
  4. Create a job script and submit your job
  5. Get some coffee and be patient:
    • Your job gets into the queue
    • Your job gets executed
    • Your job finishes
  6. Study the results generated by your jobs, either on the cluster or after downloading them locally.

What is the next step?

When you think that ALICE is a useful tool to support your computational needs, we encourage you to acquire an ALICE-account. You can find information about how to get one here: Getting an account. We also recommend that you continue reading through the User Guide which will help you in getting started with your first job(s) on ALICE. Do not hesitate to contact the ALICE staff for help.