Actions

SLURM-Monitor Jobs

From ALICE Documentation

Monitoring Your Jobs

To monitor the status of your jobs in the Slurm partitions, use the squeue command. You will only have access to see your queued jobs. Options to this command will help filter and format the output to meet your needs. See the man page for more information.

Squeue Option Action
  ---user=<username> Lists entries only belonging to username, only available to administrator
  ---jobs=<job_id> List entry, if any, for job_id
  ---partition=<partition_name> Lists entries only belonging to partition_name

Here is an example of using squeue.

         [me@nodelogin01~]$ squeue
             JOBID PARTITION      NAME     USER ST      TIME  NODES NODELIST(REASON)
             537   cpu-short      helloWor user R       0:47      2 node[004,010]
        

The output of squeue provides the following information:

Squeue Output Column Header Definition
JOBID Unique number assigned to each job
PARTITION Partition id the job is scheduled to run or is running, on
NAME Name of the job, typically the job script name
USER User id of the job
ST Current state of the job (see table below for meaning)
TIME Amount of time job has been running
NODES Number of nodes job is scheduled to run across
NODELIST(REASON) If running, the list of the nodes the job is running on. If pending, the reason the job is waiting

Valid Job States

Code State Meaning
CA Canceled Job was cancelled
CD Completed Job completed
CF Configuring Job resources being configured
CG Completing Job is completing
F Failed Job terminated with non-zero exit code
NF Node Fail Job terminated due to failure of node(s)
PD Pending Job is waiting for compute node(s)
R Running Job is running on compute node(s)
TO Timeout Job terminated upon reaching its time limit

Job in Queue

Sometimes a long queue time is an indication that something is wrong or the cluster could simply be busy. You can check to see how much longer your job will be in the queue with the command:

squeue --start --job <job_id>

Please note that this is only an estimate based on current and historical utilization and results can fluctuate. Here is an example of using squeue with the start and job options.

[me@nodelogin01~]$ squeue --start --job 384
 JOBID PARTITION 	NAME 	USER ST          START_TIME  NODES SCHEDNODES   NODELIST(REASON)

  384      main    star-lac     user PD 2018-02-12T16:09:31  	 2 (null)       (Resources)

In the above example, the job is in a pending to run state, because there are no resources available that will allow it to launch. The job is expected to start at approximately 16:09:31 on 02-12-2018. This is an estimation, as jobs ahead of it may complete sooner, freeing up necessary resources for this job. If you believe there is a problem with your job starting, and have checked your scripts for typos, send email to helpdesk@alice.leidenuniv.nl. Let us know your job ID along with a description of your problem and we can check to see if anything is wrong.

squeue to the max

squeue has extended functionality which can be of use if you are wondering about the place your jobs has in the waiting list. There are lost of options available:

 # squeue -p cpu-long -o %all
 ACCOUNT|TRES_PER_NODE|MIN_CPUS|MIN_TMP_DISK|END_TIME|FEATURES|GROUP|OVER_SUBSCRIBE|JOBID|NAME|COMMENT|TIME_LIMIT|MIN_MEMORY|REQ_NODES|COMMAND|PRIORITY|QOS|REASON||ST|USER|RESERVATION|WCKEY|EXC_NODES|NICE|S:C:T|JOBID|EXEC_HOST|CPUS|NODES|DEPENDENCY|ARRAY_JOB_ID|GROUP|SOCKETS_PER_NODE|CORES_PER_SOCKET|THREADS_PER_CORE|ARRAY_TASK_ID|TIME_LEFT|TIME|NODELIST|CONTIGUOUS|PARTITION|PRIORITY|NODELIST(REASON)|START_TIME|STATE|UID|SUBMIT_TIME|LICENSES|CORE_SPEC|SCHEDNODES|WORK_DIR
 bio|N/A|1|0|2020-07-02T12:57:00|(null)|bio|OK|24791|Omma_R_test|(null)|7-00:00:00|0||/data/vissermcde/Ommatotriton/Konstantinos_dataset/run_R.sh|0.00010384921918|normal|Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions||PD|vissermcde|(null)|(null)||0|*:*:*|24791|n/a|1|1||24791|1491|*|*|*|N/A|7-00:00:00|0:00||0|cpu-long|446029|(Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)|2020-06-25T12:57:00|PENDING|1585|2020-06-24T12:32:22|(null)|N/A|node010|/data/vissermcde/Ommatotriton/Konstantinos_dataset

From above you read that this job is planed to execute on node010 (SCHEDNODES) and that this job will start at or earlier than 2020-06-25T12:57:00 (START_TIME).

One can also print just two/a few items:

 # squeue -p cpu-long -o "%u|%S"
 USER|START_TIME
 vissermcde|2020-06-25T12:57:00

Job is Running

Another mechanism for obtaining job information is with the command scontrol show job <job_id>. This provides more detail on the resources requested and reserved for your job. It will be able to tell the status of your job, but not the status of the programs running within the job. Here is an example using scontrol.

[me@nodelogin01~]$ scontrol show job 384
JobId=390 JobName=star-ali
   UserId=ttrojan(12345) GroupId=uscno1(01) MCS_label=N/A
   Priority=1 Nice=0 Account=lc_ucs1 QOS=lc_usc1_maxcpumins
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=00:30:00 TimeMin=N/A
   SubmitTime=2018-02-12T15:39:57 EligibleTime=2018-02-12T15:39:57
   StartTime=2018-02-12T16:09:31 EndTime=2018-02-12T16:39:31 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=quick AllocNode:Sid=node-login3:21524
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null) SchedNodeList=node[001,010]
   NumNodes=2-2 NumCPUs=2 NumTasks=2 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=2,mem=2048,node=2
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=1G MinTmpDiskNode=0
   Features=[myri|IB] DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=/auto/rcf-00/ttrojan
   Power=
[me@nodelogin01~]$

When your job is done, check the log files to make sure everything has completed without incident.

Job Organization

Slurm has some handy features to help you keep organized, when you add them to the job script, or the salloc command.

Syntax Meaning
  --mail-user=<email> Where to send email alerts
  --mail-type="<BEGIN|END|FAIL|REQUEUE|ALL>" When to send email alerts
  --output=<out_file> Name of output file
  --error=<error_file> Name of error file
  --job-name=<job_name> Job name (will display in squeue output)

Get Job Usage Statistics

It can be helpful to fine-tune your job or requests knowing the resources that were used. The

sacct --jobs=<job_id>

command can provide some usage statistics for jobs that are running, and those that have completed.

Output can be filtered and formatted to provide specific information, including requested memory and peak memory used during job execution. See the man pages for more information.


      [me@nodelogin01~]$ sacct --jobs=383 --format=User,JobID,account,Timelimit,elapsed,ReqMem,MaxRss,ExitCode
           User         JobID      Account     Timelimit      Elapsed       ReqMem      MaxRSS ExitCode
      --------- ------------- ------------ ------------- ------------ ------------ ----------- --------
           user 383                lc_alice1      02:00:00     01:28:59  	 1Gc                  0:0
            	383.extern         lc_alice1                   01:28:59          1Gc                  0:0
      [me@nodelogin01~]$

Canceling a Job

Whether your job is running or waiting in the queue, you can cancel the job using the Canceling <job_id> command. Use squeue if you do not recall the job id.


     [me@nodelogin01~]$ scancel 384
     [me@nodelogin01~]$

Monitoring the Partitions in the Clusters

To see the overall status of the partitions and nodes in the clusters run the sinfo command. As with the other monitoring commands, there are additional options and formatting available.

 [me@nodelogin01~]$ sinfo
 PARTITION      AVAIL  TIMELIMIT  NODES  STATE NODELIST
 testing           up    1:00:00      2   idle nodelogin[01-02]
 cpu-short*        up    3:00:00      5    mix node[002,005,007,012-013]
 cpu-short*        up    3:00:00      2  alloc node[001,003]
 cpu-short*        up    3:00:00     13   idle node[004,006,008-011,014-020]
 cpu-medium        up 1-00:00:00      5    mix node[002,005,007,012-013]
 cpu-medium        up 1-00:00:00      2  alloc node[001,003]
 cpu-medium        up 1-00:00:00     13   idle node[004,006,008-011,014-020]
 cpu-long          up 7-00:00:00      5    mix node[002,005,007,012-013]
 cpu-long          up 7-00:00:00      2  alloc node[001,003]
 cpu-long          up 7-00:00:00     13   idle node[004,006,008-011,014-020]
 gpu-short         up    3:00:00      6    mix node[852,855,857-860]
 gpu-short         up    3:00:00      4  alloc node[851,853-854,856]
 gpu-medium        up 1-00:00:00      6    mix node[852,855,857-860]
 gpu-medium        up 1-00:00:00      4  alloc node[851,853-854,856]
 gpu-long          up 7-00:00:00      6    mix node[852,855,857-860]
 gpu-long          up 7-00:00:00      4  alloc node[851,853-854,856]
 mem               up   infinite      1  alloc node801
 notebook-cpu      up   infinite      2    mix node[002,005]
 notebook-cpu      up   infinite      2  alloc node[001,003]
 notebook-cpu      up   infinite      1   idle node004
 notebook-gpu      up   infinite      1    mix node852
 notebook-gpu      up   infinite      1  alloc node851
 playground-cpu    up   infinite      2    mix node[002,005]
 playground-cpu    up   infinite      2  alloc node[001,003]
 playground-cpu    up   infinite      1   idle node004
 playground-gpu    up   infinite      1    mix node852
 playground-gpu    up   infinite      1  alloc node851
 [me@nodelogin01~]$

Monitor the nodes in the cluster

To get detailed information on a particular compute node, use the scontrol show node=<nodename> command.

     [me@nodelogin01~]$ scontrol show node="node020"
     NodeName=node020 Arch=x86_64 CoresPerSocket=8
        CPUAlloc=16 CPUErr=0 CPUTot=16 CPULoad=1.01
        AvailableFeatures=IB,avx,avx2,xeon,E5-2640v3,nx360
        ActiveFeatures=IB,avx,avx2,xeon,E5-2640v3,nx360
        Gres=(null)
        NodeAddr=node020 NodeHostName=node020 Version=17.02
        OS=Linux RealMemory=63000 AllocMem=16384 FreeMem=45957 Sockets=2 Boards=1
        MemSpecLimit=650
        State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=16 Owner=N/A MCS_label=N/A
        Partitions=route_queue,quick,main,large,long,testSharedQ,restrictedQ,preemptMeQ,preemptYouQ
        BootTime=2018-02-08T04:08:36 SlurmdStartTime=2018-02-09T12:55:53
        CfgTRES=cpu=16,mem=63000M
        AllocTRES=cpu=16,mem=63000M
        CapWatts=n/a
        CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
        ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
     
     
     [me@nodelogin01~]$