How to Manage Many Jobs

From HPC users
Jump to navigationJump to search

Introduction

Often, you may need to run many nearly identical jobs. There are different approaches to achieve this goal with minimal effort and some of the approaches will be described below. Which approach best suits your needs depends on the nature of your problem but some hints are given for making the choice.

In the examples below, a simple program to decide whether a number is a prime will be used. The example program can found [media:ManyTasks.tgz here] and in order to use it, you can download it and then

$ tar -zxvf ManyTasks.tgz             # to unpack
$ cd ManyTasks                        # go to directory
$ make                                # build executable

After that, you can run the program, e.g. with

$ ./isprime 73
yes

to see if a number, in this case 73, is a prime (yes it is). The idea of the example is to run isprime on every number in parameter.dat. Therefore, if we want to run this as a job we can think of it as having many jobs that are identical except for one parameter. A single job can also be called a task in this context.

Managing many Tasks in a Single Job

The first approach to run all the required tasks of the example is a single job script. In the job script, we can use a loop to run all the tasks:

#!/bin/bash

### SLURM options (others can be added, e.g. for memory)
#SBATCH --partition carl.p

# loop for tasks (reads parameter.dat line by line into variable p
cat parameter.dat | while read p
do
  echo -n "Testing if $p is prime? "
  ./isprime $p
done

This approach has the disadvantage that only one job is running on the cluster and the tasks are executed in a serial manner. However, in case the indivdual tasks are very short (less than a few minutes maybe) and the number of tasks is not too large (less than 100), this approach might be useful. Note, that in a bash-script you can also use for-loops, e.g.

...
for ((i=1; i<=100; i++))
do
  # use $i to e.g. read a line from parameter.dat
  ... 
done 

Managing many Tasks in a Job Array

Alternatively to a loop in the job script, you could use a loop to submit many individual jobs, one for each task. However, this would put a lot of strain on the job scheduler (which can be reduced with a sleep 1 after each submission) and in fact, SLURM provides job arrays as a better alternative.

To run our example in form of a job array, we can use the job script:

#!/bin/bash

### SLURM options (others can be added, e.g. for memory)
#SBATCH --partition carl.p
#SBATCH --array 1:100

# get paramter from file for each task
p=$(awk "NR==$SLURM_ARRAY_TASK_ID {print \$1}" parameter.dat)

# run task
echo -n "Testing if $p is prime? "
./isprime $p

Note, how the SLURM environment variable SLURM_ARRAY_TASK_ID can be used (in combination with awk) to read a certain line from the parameter file. Also, no loop is needed as SLURM is automatically creating individual jobs for each task define by the array-option.

This approach is much cleaner than the loop-based approach before and is recommended for most problems of this nature. However, it should be noted that each job creates a small overhead for scheduling, starting and completing the job. Therefore, individual tasks should run for more than a few minutes (not like the example), if needed you can combine the job array and loop approach to combine some of the tasks into a single job array task. Furthermore, you should always make sure not to run too many array tasks at the same time on the cluster, e.g. by limiting your array.

More details about job arrays are described here.

Managing Many Tasks with the Parallel Command

The third approach uses the shell tool GNU Parallel which is available as a module on the cluster, e.g.:

$ module load hpc-env/6.4
$ module load parallel

The parallel-command can be used for executing tasks in parallel with the tool managing the available resources. Within a SLURM job, it can be combined with srun to execute the tasks on a subset of the resources allocated for the job. To achieve this, we first should define a bash-script for a single task that takes one or more arguments. For our example, we can write prime_task.sh as follows

#!/bin/bash
#
# script to run single task to test number given as arg $1 for being prime
# additional random wait time to mimick different run times

# wait for a random time (10-20s)
twait=$((($RANDOM % 11) + 10))
sleep $twait

# get paramter from file based on argument $1
p=$(awk "NR==$1 {print \$1}" parameter.dat)

# test p for being prime
echo -n "Testing if $p is prime? "
./isprime $p

We have expanded the example a little bit by adding a random wait time of 10 to 20 seconds to mimick different run times for each task, but otherwise it is the same script as we used in the job array approach. We can run this script, e.g.

$ chmod a+x prime_task.sh
$ ./prime_task.sh 80
Testing if 73 is prime? yes

The output appears after a few seconds due to the extra sleep-command. Now we can use parallel to execute e.g. a total of 20 tasks (given as the range {1..20} with 10 tasks running concurrently (option -j <ntasks>) with the command:

$ parallel -N 1 -j 10 --joblog parallel.log ./prime_task.sh {1} ::: {1..20}
Testing if 70 is prime? no
Testing if 7 is prime? yes
Testing if 69 is prime? no
...

In this command, the number of tasks is determined by the range given after the :::-separator. Each value from the given range is passed as argument {1} to the shell script (the option -N 1 tells parallel to pass one argument to each task). With the option --joblog you will get a log file with information about the execution of each task. Please refer to the parallel documentation (e.g. the man pages) for more examples on how to use the tool.

To use parallel within a job script, we can combine it with srun to distribute the tasks on the allocated resources. The following job script gives zou a basic example how to do this:

#!/bin/bash

# SLURM options
#SBATCH --partition carl.p
#SBATCH --ntasks 24

# Define srun arguments:
srun="srun -n1 -N1 --exclusive"
# --exclusive     ensures srun uses distinct CPUs for each job step
# -N1 -n1         allocates a single core to each task

# Define parallel arguments:
parallel="parallel -N 1 --delay .2 -j $SLURM_NTASKS --joblog parallel_job.log"
# --delay .2        prevents overloading the controlling node on short jobs
# --resume          add if needed to use joblog to continue an interrupted run (job resubmitted)

# Run the tasks for 100 lines in parameter.dat
$parallel "$srun ./prime_task.sh {1}" ::: {1..100}

The advantage of this approach is that you can run your tasks in a single job with less overhead compared to a job array (important if the tasks are short-running) and you can make efficient use of the allocated resources (as parallel can handle different run times of the tasks). It can also be more friendly to other users in certain situations (large job arrays sometimes tend to fill up the cluster).

This guide uses material from [1], also note that the use of parallel should be cited, see

$ parallel --citation