Actions

SLURM-Job in Queue

From ALICE Documentation

Revision as of 14:19, 29 June 2020 by Deuler (talk | contribs) (squeue to the max)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Job in Queue

Sometimes a long queue time is an indication that something is wrong or the cluster could simply be busy. You can check to see how much longer your job will be in the queue with the command:

squeue --start --job <job_id>

Please note that this is only an estimate based on current and historical utilization and results can fluctuate. Here is an example of using squeue with the start and job options.

[me@nodelogin01~]$ squeue --start --job 384
 JOBID PARTITION 	NAME 	USER ST          START_TIME  NODES SCHEDNODES   NODELIST(REASON)

  384      main    star-lac     user PD 2018-02-12T16:09:31  	 2 (null)       (Resources)

In the above example, the job is in a pending to run state, because there are no resources available that will allow it to launch. The job is expected to start at approximately 16:09:31 on 02-12-2018. This is an estimation, as jobs ahead of it may complete sooner, freeing up necessary resources for this job. If you believe there is a problem with your job starting, and have checked your scripts for typos, send email to helpdesk@alice.leidenuniv.nl. Let us know your job ID along with a description of your problem and we can check to see if anything is wrong.

squeue to the max

squeue has extended functionality which can be of use if you are wondering about the place your jobs has in the waiting list. There are lost of options available:

 # squeue -p cpu-long -o %all
 ACCOUNT|TRES_PER_NODE|MIN_CPUS|MIN_TMP_DISK|END_TIME|FEATURES|GROUP|OVER_SUBSCRIBE|JOBID|NAME|COMMENT|TIME_LIMIT|MIN_MEMORY|REQ_NODES|COMMAND|PRIORITY|QOS|REASON||ST|USER|RESERVATION|WCKEY|EXC_NODES|NICE|S:C:T|JOBID|EXEC_HOST|CPUS|NODES|DEPENDENCY|ARRAY_JOB_ID|GROUP|SOCKETS_PER_NODE|CORES_PER_SOCKET|THREADS_PER_CORE|ARRAY_TASK_ID|TIME_LEFT|TIME|NODELIST|CONTIGUOUS|PARTITION|PRIORITY|NODELIST(REASON)|START_TIME|STATE|UID|SUBMIT_TIME|LICENSES|CORE_SPEC|SCHEDNODES|WORK_DIR
 bio|N/A|1|0|2020-07-02T12:57:00|(null)|bio|OK|24791|Omma_R_test|(null)|7-00:00:00|0||/data/vissermcde/Ommatotriton/Konstantinos_dataset/run_R.sh|0.00010384921918|normal|Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions||PD|vissermcde|(null)|(null)||0|*:*:*|24791|n/a|1|1||24791|1491|*|*|*|N/A|7-00:00:00|0:00||0|cpu-long|446029|(Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)|2020-06-25T12:57:00|PENDING|1585|2020-06-24T12:32:22|(null)|N/A|node010|/data/vissermcde/Ommatotriton/Konstantinos_dataset

From above you read that this job is planed to execute on node010 (SCHEDNODES) and that this job will start at or earlier than 2020-06-25T12:57:00 (START_TIME).

One can also print just two/a few items:

 # squeue -p cpu-long -o "%u|%S"
 USER|START_TIME
 vissermcde|2020-06-25T12:57:00