Actions

Difference between revisions of "Getting started with HPC"

From ALICE Documentation

 
(28 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This documentation is for researchers who have an HPC account and need to know how to connect, log in, transfer data and run jobs on HPC.
+
[[Category:User Guides]]
An HPC account is required to log into and use HPC resources. To apply for an HPC account see the Applying for an HPC Account page.
+
{{:What is HPC?}}
 +
{{:What is ALICE?}}
 +
{{:What_ALICE_is_not}}
 +
{{:What does a typical workflow look like?}}
  
==[[Overview]]==
+
{{:What is the next step?}}
===[[Cluster]]===
 
HPC is a collection of computers and disk arrays that are connected via fast networks that allow USC researchers to run programs at a larger scale than they would be able to on a laptop or lab computer.
 
 
 
The schematic below gives you a general idea of how these parts connect.
 
 
 
 
 
 
 
When using HPC, you will notice several differences from the desktop or laptop environment with which you may be familiar.
 
 
 
HPC’s interface is command-line driven – there is no graphical user interface.
 
HPC nodes run the Linux CentOS operating system and not Windows or Mac OS.
 
You must submit your programs to a remote batch processing system to run them – although there is a way to test them interactively before they are submitted.
 
 
 
===[[Workflow]]===
 
The workflow for using HPC typically consists of the following steps:
 
 
 
====[[Log_in_to_an_HPC_Login_(or_Head)_Node|Login to HPC login/head node.]]====
 
====[[Organize workspace.]]====
 
====[[Transfer data and files.]]====
 
====[[Install/run software on HPC.]]====
 
====[[Test your job interactively on a compute node.]]====
 
====[[Submit your job to the batch processor, to run it remotely on a compute node.]]====
 
====[[Monitor your job and check your results when it has completed.]]====
 
 
 
[[Log in to an HPC Login (or Head) Node]]
 
To log into HPC, you will need to use a secure shell client. This is a small application that enables you to connect to a remote computer via SSH (Secure SHell), a cryptographic network protocol.
 
 
 
NOTE: HPC does not manage your USC NetID password. If you are having difficulty using your USC NetID and/or password, please contact the ITS Customer Support Center at 213-740-5555.
 
 
 
SSH (Secure Shell) Login
 
On Windows
 
Windows users will need to download a third-party secure shell client to connect to HPC. You may use the client that works best for you. Here are a few of the most popular clients:
 
X-Win32, which is available through the USC software website. This USC-licensed version has a pre-configured connection to HPC.
 
PuTTY, which is a popular third-party client that may be downloaded through the developer’s website. When configuring PuTTY, you will need to enable the Connection=>SSH=>X11=>“Enable X forwarding” option to allow HPC to send graphical displays to your laptop.
 
To connect to HPC using any SSH client, download and install the client and then launch a connection window. You will be asked to provide your USC NetID and password.
 
 
 
NOTE:  If you are not using X-Win32, which is preconfigured to connect to HPC, you will need to enter the following hostname in order to connect to HPC: hpc-login3.usc.edu (or hpc-login2.usc.edu).
 
 
 
On Mac OS X
 
Mac OS X users can connect to HPC using the Terminal application that is native on these systems. The Terminal application may be found in the Utilities group on the Mac OS X dashboard or in the Finder menu under Goand then Utilities.To connect to HPC using Terminal, open a window and type:
 
ssh YOUR_USC_NetID@hpc-login3.usc.edu
 
(or hpc-login2.usc.edu), where YOUR_USC_NetID is your username for the email that ends in @usc.edu. You will then be prompted to enter your USC NetID password.
 
 
 
To enable graphical displays, you will need to install the X11.app from XQuartz and use the -Y command when logging into HPC, i.e.:
 
 
 
ssh -Y YOUR_USC_NetID@hpc-login3.usc.edu
 
Duo Two-Factor Authentication (2FA)
 
Duo 2FA is required to access HPC. If you have not already signed up for Duo on your USC NetID account, please go to itservices.usc.edu/duo/enroll to enrol. For more information on using Duo with your HPC account, see the Duo Two-Factor Authentication page.
 
 
 
 
 
Organize Directories
 
File System
 
Always work in your project directory! All HPC account holders are assigned two folders where they can store files and run programs. These are referred to as the home directory and project directory.  Your home directory contains exactly 1 gigabyte (GB) of disk space and is strictly for personal configurations and settings. You will always start in your home directory when you log into HPC.
 
 
 
 
 
 
 
Your project directory is larger and will be the directory you use for most HPC work. This will also be where you will collaborate with your group. Its disk space can vary. Every user will have their own subdirectory within their group’s project directory where they can store data files. Users affiliated with multiple HPC projects will have multiple project directories so they can easily share their files with the appropriate groups.
 
 
 
Limits on Disk Space and Number of Files
 
HPC is a shared resource so there are quotas on usage to help ensure fair access. There are quotas on the number of files stored and the amount of disk space used.
 
 
 
To check your assigned disk space quota, type the command myquota. It will return results similar to the following:
 
 
 
------------------------------------------
 
 
 
Disk Quota for /home/rcf-40/ ID 268648
 
 
 
            Used      Soft    Hard
 
 
 
    Files  1905      100000  101000
 
    Bytes  360.38M  1.00G    1.00G
 
 
 
------------------------------------------
 
 
 
Disk Quota for /home/rcf-proj2/ ID 735
 
 
 
            Used    Soft    Hard
 
 
 
    Files  645157  1000000  1001000
 
    Bytes  106.90G  1.00T    1.02T
 
 
 
------------------------------------------
 
In this example, the user has access to two different directories. The first is their home directory which has room for up to 100,000 files and 1GB of data. The second is their project directory which has room for up to 1,000,000 files and 1TB of data. Quotas for project directories are shared amongst all group members. If you exceed these limits, you may receive a “disk quota exceeded” or other type error. If your project directory becomes full, please send email to hpc@usc.edu for assistance.  Please note that HPC is unable to increase your home directory’s quota.
 
 
 
The myquota command is handy if you forget where your project directory is. You can also use the Checking Your Quotas page to check quotas.
 
 
 
HPC also assigns each project and project member a directory in another location called “staging”.  If you need access to a large amount of temporary storage or need a high-performance (parallel) file system, you can keep data in/staging. This area is not backed up and is cleared out about every six months so don’t store data here unless it’s easily reproducible or there is a secondary copy elsewhere.
 
 
 
Compute Time Quota
 
Every project is allocated a default number of computing time. To check the amount of compute time you have available you can use the command mybalance -h. The -h flag gives the results in hours. It will return results similar to:
 
 
 
mybalance
 
 
 
Account:lc_hpcc Allocated CPU-Hours:5000 Used CPU-Hours:612 Available CPU-Hours:4388
 
Your compute usage is measured in units of CPU-hours. If you request one cpu cores for 1 hour then you consume 1 core x 1 hours = 1 core-hour. If you request 8 cpu cores for 2 two hours then you consume 8 cores x 2 hours = 16 core-hours. Note that if you leave off the -h in the command above, the results will be displayed in core-minutes.
 
 
 
It is a good idea to check this balance before submitting a large job. Your project PI may request additional core-hours at no cost through the project website (link). If you are in multiple research groups make sure that you keep track of which project you request compute time for or you may consume compute resources for one group while doing work for another.
 
 
 
 
 
Transfer Files to HPC
 
HPC has a dedicated data transfer node (DTN), hpc-transfer.usc.edu, that is configured for fast file transfers. HPC-transfer is also a Globus endpoint. Use HPC-transfer instead of an HPC-login node when logging in to transfer files. Always transfer files into your project or staging directories where you have sufficient disk space.
 
 
 
Between your laptop and HPC
 
One of the easiest ways to transfer files is to use a utility like FileZilla, a Secure File Transfer Protocol (SFTP) client. It’s also possible to use the command line function scp. You can find a detailed guide on how to install and use these tools at itservices.usc.edu/sftp/. Remember to use hpc-transfer.usc.edu as the hostname when you transfer files.
 
 
 
From the Internet to HPC
 
You can transfer a file from the Internet directly to your project directory on HPC (without first downloading to your laptop). For example, if want to transfer a repository from GitHub, use the command git clone REPOSITORY_URL, where REPOSITORY_URL is the link you copied from Github. If you want to transfer a file from a web page, you can use the command wget URL. If you need to transfer data from a private location (i.e., one that requires logging in), the site may or may not allow you to use wget for the transfer.
 
 
 
If you need to frequently transfer files, plan to move large amounts of data, or need assistance transferring data from a private location, feel free to contact us at hpc@usc.edu for advice on how to do this efficiently.
 
 
 
Creating and Editing Files on HPC
 
You can always create files on your personal computer and transfer them to HPC but sometimes it is easiest to create them directly on HPC.
 
 
 
HPC supports the vi/vim, gedit, nano and emacs text editors. Nano is used in HPC training sessions because it is an easy editor to learn. Gedit is a good option if you log in with “X11 forwarding” enabled which is pre-configured on USC’s version of X-Win32 and enabled by XQuartz’s X11.app on Mac OS. Vi/vim, which comes standard on all UNIX/Linux machines, and emacs, which is a popular coding environment, both have steeper learning curves.
 
 
 
To edit a file, simply type the editors name, e.g., nano or gedit, at the command line and then type in your file’s text.
 
 
 
 
 
Install/Run Software on HPC
 
Once you are logged in you can use software, work with files, run brief tests, or submit Slurm scripts to the job queue. The login nodes are a shared resource so be careful not to do tasks that will impact other users. If your usage impacts other users, we may terminate your process without warning.
 
 
 
Installing Your Own Software on HPC
 
Researchers are encouraged to install any software, libraries, and packages necessary for their work. HPC has a presentation on how to approach installing software on HPC. See https://hpcc.usc.edu/support/hpcc-computing-workshops/.
 
 
 
Using HPC-Maintained Software
 
HPC maintains software, compilers, and libraries in the directory /usr/usc. These are programs that support distributed computing, are licensed for use on HPC, or are commonly used by HPC researchers. A current listing of software in /usr/usc/ shows the following:
 
 
 
[ttrojan@hpc-login3 ~]$ cd /usr/usc
 
[ttrojan@hpc-login3 usc]$ ls
 
 
 
acml          fftw        iperf          mvapich2        R
 
amber        gaussian    java          NAMD            root
 
aspera        gcc_wrap    jdk            ncview          ruby
 
bazel        gflags      julia          netcdf          sas
 
bbcp          git        lam            netcdf-fortran  schrodinger
 
bin          globus      lammps        nwdb            sgems
 
boost        glog        leveldb        opencv          singularity
 
caffe        gnu        libroadrunner  OpenGeos        stata
 
cellprofiler  graph-tool  llvm          openmpi        subversion
 
CGAL          gromacs    lmod          papi            swig
 
clang        gurobi      lua            patchelf        taxila
 
cmake        hadoop      magma          perl            tdk
 
conf          hdf5        mathematica    petsc          tensorflow
 
cuda          hdfview    matlab        pgi            udunits
 
cuDNN        hello_usc  mkl            protobuf        valgrind
 
cula          hpctoolkit  modulefiles    python          VisIt
 
dict          igraph      mongo2k        qchem
 
dmtcp        imp        mpich          qespresso
 
etc          imsl        mpich2        qiime
 
fdtd          intel      mpich-mx      QT
 
As an example, we’ll look at a program named hello_usc. Let’s look at what’s inside /usr/usc/hello_usc/.
 
 
 
[ttrojan@hpc-login3 usc]$ cd /usr/usc/hello_usc/
 
[ttrojan@hpc-login3 hello_usc]$ ls
 
1.0  2.0  3.0  default
 
[ttrojan@hpc-login3 hello_usc]$
 
We have three versions of this program. We keep multiple versions of most software for compatibility. There is also a “version” called default which is a pointer to the HPC-recommended version, usually the most recent version. To see which version default points to, type the command ls -l.
 
 
 
[ttrojan@hpc-login3 hello_usc]$ ls -l
 
total 28
 
drwxr-xr-x 3 root root 4096 Sep 28  2016 1.0
 
drwxr-xr-x 3 root root 4096 Sep 28  2016 2.0
 
drwxr-xr-x 3 root root 4096 Sep 28  2016 3.0
 
lrwxrwxrwx 2 root root    3 Apr 24  2015 default -> 3.0
 
Let’s try using the most recent version, 3.0. Within that directory, you should see another directory named bin and two “setup” scripts.
 
 
 
[ttrojan@hpc-login3 hello_usc]$ ls 3.0
 
bin  setup.csh  setup.sh
 
By convention, executable programs are usually installed into a subdirectory named /bin. If you list the contents of this subdirectory, you will find a program named hello_usc.
 
 
 
[ttrojan@hpc-login3 hello_usc]$ ls 3.0/bin/
 
hello_usc
 
To use this program on HPC, you must first run a setup script. Setup scripts enable you to run software by modifying your runtime environment. You will find two scripts, setup.sh and setup.csh, for every software version. The .sh and .csh suffixes indicate that the scripts are compatible with the bash and csh shells (and their derivatives), respectively. Each has its own syntax. Bash is the default shell on HPC so unless you have requested to change your default shell, you will always use setup.sh. For more information about Linux shells, see [link].
 
 
 
To use the script, run the command source.
 
 
 
[@hpc-login3 hello_usc]$ hello_usc
 
-bash: hello_usc: command not found
 
[ttrojan@hpc-login3 hello_usc]$ source /usr/usc/hello_usc/3.0/setup.sh
 
[ttrojan@hpc-login3 hello_usc]$ hello_usc
 
 
 
    Hello USC!!!.
 
    I am version 3.0 running on host: hpc-login3
 
If you decide that you’d like to use a different version of the program, you can log out and log back in to reset your environment.
 
 
 
NOTE: While it is possible to add a source statement to directly to your login script (.bashrc or .cshrc) to automatically set the program version to use with every script, HPC recommends against doing so as they may conflict with each other and cause unexpected behavior. We recommend you put these statements in each specific job script which will be helpful when troubleshooting problems with your environment.
 
 
 
 
 
Test your Job
 
We recommend that you first test your job interactively on a compute node before submitting it remotely so that you’ll be confident that you will have quality results after a job completes. You can do this by requesting an “interactive session”. This will enable you to request one or more compute nodes that you can use without impacting other users.
 
 
 
To request an interactive session, use the command salloc. In the example below, four processors are requested for one hour.
 
 
 
salloc --ntasks=4 --time=1:00:00
 
After running the command, the job scheduler will add your job to the wait queue. You should see a message similar to the following (where 3271 is the job id).
 
 
 
salloc: Pending job allocation 3271
 
salloc: job 3271 queued and waiting for resources
 
When the requested resources become available and are allocated to you, you should see more messages. The Slurm prolog is displayed when the job begins.
 
 
 
salloc: job 3271 has been allocated resources
 
salloc: Granted job allocation 3271
 
salloc: Waiting for resource configuration
 
salloc: Nodes hpc1407 are ready for job
 
----------------------------------------
 
Begin SLURM Prolog Fri Mar 16 15:07:29 2018
 
Job ID:        3271
 
Username:      ttrojan
 
Accountname:  lc_hpcc
 
Name:          sh
 
Partition:    quick
 
Nodes:        hpc1407
 
TasksPerNode:  4
 
CPUSPerTask:  Default[1]
 
TMPDIR:        /tmp/3271.quick
 
Cluster:      uschpc
 
HSDA Account:  false
 
Note: Settings of SLURM_EXPORT_ENV=NONE are cleared prior to running job-steps
 
End SLURM Prolog
 
----------------------------------------
 
Once your job starts you can test out your programs or scripts to make sure they work properly on HPC. If there is a problem that you cannot resolve, send email to hpc@usc.edu for assistance. Once you are confident that you know how your program will behave, you are ready to try out submitting a job through the batch scheduler.
 
 
 
 
 
Submit your Job
 
A job consists of all commands, data, scripts and programs that will be used to obtain results. Jobs are submitted to HPC’s batch processing system (SLURM) which performs the following functions:
 
 
 
Schedules user-submitted jobs
 
Allocates user-requested computing resources
 
Processes user-submitted jobs
 
sbatch helloUSC.sh
 
Submitted batch job 3291
 
squeue
 
            JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
 
              3176    quick job05010  bbruin PD      0:00      1 (Resources)
 
              3291    quick helloUSC  ttrojan PD      0:00      1 (Priority)
 
              3181      scec 2018_03_  ggeagle  R 1-01:10:58      5 hpc[4192-4196]
 
Jobs submitted to the system are processed remotely. The process is recorded and written to an output file, which, by default, is named “slurm-“.
 
 
 
NOTE: Head nodes (hpc-login2, hpc-login3, and hpc-transfer) are shared resources that are used by many users simultaneously. Compute nodes currently are not shared (this may differ on private nodes). You may run short tests on the head nodes but beyond that you will need to use the compute nodes. HPC has only three head nodes and almost 3000 compute nodes!
 
 
 
See the documentation at https://hpcc.usc.edu/support/documentation/slurm/ for instructions on requesting job resources, creating and submitting a job script, and monitoring your job under Slurm.
 
 
 
 
 
Monitor your job
 
There are several commands you can use to monitor a job after it has been submitted.
 
 
 
To see if your job has been queued:
 
The first thing you’ll want to check is if your job request was queued. Use the squeue command to view the status of your jobs:
 
 
 
squeue --user username
 
Each job is assigned a unique job identifier (Job ID). It is sufficient to use only the numeric portion of the job id when referencing a job or submitting a ticket.
 
 
 
In the example screenshot below, the job 3271 has been placed in the “quick” partition (PARTITION) based on its requested time of 1 hour. It has been running for 35 minutes and 58 seconds (Time). The job requested 4 tasks and was allocated 1 node (NODES). The status of the job is “R” (running).(ST).
 
 
 
squeue --user ttrojan
 
            JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
 
              3271    quick      sh  ttrojan  R      35:58      1 hpc1407
 
 
 
See the documentation at https://hpcc.usc.edu/support/documentation/slurm/ for instructions on requesting job resources, creating and submitting a job script, and monitoring your job under Slurm.
 
 
 
To see when your job will start:
 
You can also use the squeue command to determine when your job will start:
 
 
 
squeue --start -j job_id
 
To cancel your job and remove from queue:
 
If you wish to delete your job from the queue, you can use the qdel command. Your job will remain in the queue for a short while but its status will change to ‘C’ for complete.
 
 
 
scancel job_id
 
Getting Help
 
If you need additional assistance getting started with HPC, please see our Getting Help page for information on online and in-person HPC assistance.
 

Latest revision as of 18:07, 21 September 2020

What is HPC?

“High-Performance Computing” (HPC) is computing on a “supercomputer”, a computer at the frontline of contemporary processing capacity – particularly speed of calculation and available memory.

While the supercomputers in the early days (around 1970) used only a few processors, in the 1990s machines with thousands of processors began to appear and, by the end of the 20th century, massively parallel supercomputers with tens of thousands of “off-the-shelf” processors were the norm. A large number of dedicated processors are placed in close proximity to each other in a computer cluster.

A computer cluster consists of a set of loosely or tightly connected computers that work together so that in many respects they can be viewed as a single system.

The components of a cluster are usually connected to each other through fast local area networks (“LAN”) with each node (computer used as a server) running its own instance of an operating system. Computer clusters emerged as a result of the convergence of a number of computing trends including the availability of low-cost microprocessors, high-speed networks, and software for high performance distributed computing.

Compute clusters are usually deployed to improve performance and availability over that of a single computer, while typically being more cost-effective than single computers of comparable speed or availability.

Nowadays, supercomputers play an important role in large variety of areas where computationally intensive problems have to be solved. This is not just limited to computational and natural sciences (Phyiscs, Astronomy, Chemistry and Biology), but also includes social and medical sciences, mathematics and much more.

What is ALICE?

ALICE is a collection of computers with Intel CPUs, running a Linux operating system, shaped like pizza boxes and stored above and next to each other in racks, interconnected with copper and fibre cables. Their number-crunching power is (presently) measured in tens of trillions of floating-point operations (teraflops).

ALICE relies on parallel-processing technology to offer LU and LUMC researchers an extremely fast solution for all their data processing needs.

ALICE is a shared resource system which means that it is used by multiple users at the same time. It utilizes a state-of-the-art management system to make sure that each user can get the best out of ALICE. Naturally, there are limits to ensure that all users have a fair-share of the available resources. However, a great deal of responsibility lies also with you as a user to make sure that resources are available for everyone.

Here is a summary of what ALICE currently looks like: Overview of the cluster

What ALICE is not

ALICE is not a magic computer that automatically:

  1. runs your PC-applications much faster for bigger problems;
  2. develops your applications;
  3. solves your bugs;
  4. does your thinking;
  5. . . .
  6. allows you to play games even faster.

ALICE does not replace your desktop computer.

What does a typical workflow look like?

A typical workflow looks like:

  1. Connect to the login nodes with SSH.
  2. Transfer your files to the cluster
  3. Optional: compile your code and test it
  4. Create a job script and submit your job
  5. Get some coffee and be patient:
    • Your job gets into the queue
    • Your job gets executed
    • Your job finishes
  6. Study the results generated by your jobs, either on the cluster or after downloading them locally.

What is the next step?

When you think that ALICE is a useful tool to support your computational needs, we encourage you to acquire an ALICE-account. You can find information about how to get one here: Getting an account. We also recommend that you continue reading through the User Guide which will help you in getting started with your first job(s) on ALICE. Do not hesitate to contact the ALICE staff for help.