Actions

Running a command with a maximum time limit

From ALICE Documentation

Running a command with a maximum time limit

If you want to run a job, but you are not sure it will finish before the job runs out of walltime and you want to copy data back before, you have to stop the main command before the walltime runs out and copy the data back. This can be done with the timeout command. This command sets a limit of time a program can run for, and when this limit is exceeded, it kills the program. Here’s an example job script using timeout:

— timeout.sh —

 #!/bin/bash 
 #PBS -N timeout_example 
 #PBS -l nodes=1:ppn=1 ## single-node job, single core 
 #PBS -l walltime=2:00:00 ## max. 2h of wall time 
  
 # go to temporary working directory (on local disk) 7 cd $TMPDIR 
 # This command will take too long (1400 minutes is longer than our all-time) 
 # $PBS_O_WORKDIR/example_program.sh 1400 output.txt 
 
 # So we put it after a timeout command 
 # We have a total of 120 minutes (2 x 60) and we instruct the script to run for 
 # 100 minutes, but timeout after 90 minute, 
 # so we have 30 minutes left to copy files back. This should 
 # be more than enough. 
 timeout -s SIGKILL 90m $PBS_O_WORKDIR/example_program.sh 100 output.txt 
 # copy back output data, ensure unique filename using $PBS_JOBID 18 cp output.txt $VSC_DATA/output_${PBS_JOBID}.txt

The example program used in this script is a dummy script that simply sleeps a specified amount of minutes: — example_program.sh —

 #!/bin/bash   
 # This is an example program 
 # It takes two arguments: a number of times to loop and a file to write to  
 # In total, it will run for (the number of times to loop) minutes 
 
 if [ $# -ne 2 ]; then 
      echo "Usage: ./example_program amount filename" && exit 1 
 fi 
 
 for ((i = 0; i < $1; i++ )); do 
     echo "${i} => $(date)" >> $2 
     sleep 60 
 done