Actions

Getting started with HPC

From ALICE Documentation

Revision as of 14:35, 9 April 2020 by Dijkbvan (talk | contribs) (What are cores, processors and nodes?)

What is HPC?

What is HPC?

“High-Performance Computing” (HPC) is computing on a “supercomputer”, a computer at the frontline of contemporary processing capacity – particularly speed of calculation and available memory.

While the supercomputers in the early days (around 1970) used only a few processors, in the 1990s machines with thousands of processors began to appear and, by the end of the 20th century, massively parallel supercomputers with tens of thousands of “off-the-shelf” processors were the norm. A large number of dedicated processors are placed in close proximity to each other in a computer cluster.

A computer cluster consists of a set of loosely or tightly connected computers that work together so that in many respects they can be viewed as a single system.

The components of a cluster are usually connected to each other through fast local area networks (“LAN”) with each node (computer used as a server) running its own instance of an operating system. Computer clusters emerged as a result of the convergence of a number of computing trends including the availability of low-cost microprocessors, high-speed networks, and software for high performance distributed computing.

Compute clusters are usually deployed to improve performance and availability over that of a single computer, while typically being more cost-effective than single computers of comparable speed or availability.

Nowadays, supercomputers play an important role in large variety of areas where computationally intensive problems have to be solved. This is not just limited to computational and natural sciences (Phyiscs, Astronomy, Chemistry and Biology), but also includes social and medical sciences, mathematics and much more.

What is ALICE?

ALICE is a collection of computers with Intel CPUs, running a Linux operating system, shaped like pizza boxes and stored above and next to each other in racks, interconnected with copper and fibre cables. Their number-crunching power is (presently) measured in tens of trillions of floating-point operations (teraflops).

ALICE relies on parallel-processing technology to offer LU and LUMC researchers an extremely fast solution for all their data processing needs.

ALICE is a shared resource system which means that it is used by multiple users at the same time. It utilizes a state-of-the-art management system to make sure that each user can get the best out of ALICE. Naturally, there are limits to ensure that all users have a fair-share of the available resources. However, a great deal of responsibility lies also with you as a user to make sure that resources are available for everyone.

Here is a summary of what ALICE currently looks like: Overview of the cluster

What the HPC infrastructure is not

Is the HPC a solution for my computational needs?

Batch or interactive mode?

Typically, the strength of a supercomputer comes from its ability to run a huge number of programs (i.e., executables) in parallel without any user interaction in real-time. This is what is called “running in batch mode”. It is also possible to run programs at ALICE, which require user interaction. (pushing buttons, entering input data, etc.). Although technically possible, the use of ALICE might not always be the best and smartest option to run those interactive programs. Each time some user interaction is needed, the computer will wait for user input. The available computer resources (CPU, storage, network, etc.) might not be optimally used in those cases. More in-depth analysis with the ALICE staff can unveil whether the ALICE is the desired solution to run interactive programs. Interactive mode is typically only useful for creating quick visualization of your data without having to copy your data to your desktop and back.

What are cores, processors and nodes?

In this manual, the terms core, processor and node will be frequently used, so it’s useful to understand what they are. Modern servers, also referred to as (worker)nodes in the context of HPC, include one or more sockets, each housing a multi-core processor (next to memory, disk(s), network cards, . . . ). A modern processor consists of multiple CPUs or cores that are used to execute computations.

Parallel or sequential programs?

What programming languages can I use?

What operating systems can I use?

What does a typical workflow look like?

What is the next step?