SLURM-Job in Queue
From ALICE Documentation
Revision as of 14:14, 29 June 2020 by Deuler
Job in Queue
Sometimes a long queue time is an indication that something is wrong or the cluster could simply be busy. You can check to see how much longer your job will be in the queue with the command:
squeue --start --job <job_id>
Please note that this is only an estimate based on current and historical utilization and results can fluctuate. Here is an example of using squeue with the start and job options.
[me@nodelogin01~]$ squeue --start --job 384 JOBID PARTITION NAME USER ST START_TIME NODES SCHEDNODES NODELIST(REASON) 384 main star-lac user PD 2018-02-12T16:09:31 2 (null) (Resources)
In the above example, the job is in a pending to run state, because there are no resources available that will allow it to launch. The job is expected to start at approximately 16:09:31 on 02-12-2018. This is an estimation, as jobs ahead of it may complete sooner, freeing up necessary resources for this job. If you believe there is a problem with your job starting, and have checked your scripts for typos, send email to firstname.lastname@example.org. Let us know your job ID along with a description of your problem and we can check to see if anything is wrong.
squeue has extended functionality which can be of use if you are wondering about the place your jobs has in the waiting list. There are lost of options available:
# squeue -p cpu-long -o %all ACCOUNT|TRES_PER_NODE|MIN_CPUS|MIN_TMP_DISK|END_TIME|FEATURES|GROUP|OVER_SUBSCRIBE|JOBID|NAME|COMMENT|TIME_LIMIT|MIN_MEMORY|REQ_NODES|COMMAND|PRIORITY|QOS|REASON||ST|USER|RESERVATION|WCKEY|EXC_NODES|NICE|S:C:T|JOBID|EXEC_HOST|CPUS|NODES|DEPENDENCY|ARRAY_JOB_ID|GROUP|SOCKETS_PER_NODE|CORES_PER_SOCKET|THREADS_PER_CORE|ARRAY_TASK_ID|TIME_LEFT|TIME|NODELIST|CONTIGUOUS|PARTITION|PRIORITY|NODELIST(REASON)|START_TIME|STATE|UID|SUBMIT_TIME|LICENSES|CORE_SPEC|SCHEDNODES|WORK_DIR bio|N/A|1|0|2020-07-02T12:57:00|(null)|bio|OK|24791|Omma_R_test|(null)|7-00:00:00|0||/data/vissermcde/Ommatotriton/Konstantinos_dataset/run_R.sh|0.00010384921918|normal|Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions||PD|vissermcde|(null)|(null)||0|*:*:*|24791|n/a|1|1||24791|1491|*|*|*|N/A|7-00:00:00|0:00||0|cpu-long|446029|(Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)|2020-06-25T12:57:00|PENDING|1585|2020-06-24T12:32:22|(null)|N/A|node010|/data/vissermcde/Ommatotriton/Konstantinos_dataset
From above you read that this job is planed to execute on node010 (SCHEDNODES) and that this job will start at or earlier than 2020-06-25T12:57:00 (START_TIME).
One can also print just those two items:
# squeue -p cpu-long -o "%u|%S" USER|START_TIME vissermcde|2020-06-25T12:57:00