How to Manage Many Jobs

From HPC users
Jump to navigationJump to search

Introduction

Often, you may need to run many nearly identical jobs. There are different approaches to achieve this goal with minimal effort and some of the approaches will be described below. Which approach best suits your needs depends on the nature of your problem but some hints are given for making the choice.

In the examples below, a simple program to decide whether a number is a prime will be used. The example program can found [media:ManyTasks.tgz here] and in order to use it, you can download it and then

$ tar -zxvf ManyTasks.tgz             # to unpack
$ cd ManyTasks                        # go to directory
$ make                                # build executable

After that, you can run the program, e.g. with

$ ./isprime 73
yes

to see if a number, in this case 73, is a prime (yes it is). The idea of the example is to run isprime on every number in parameter.dat. Therefore, if we want to run this as a job we can think of it as having many jobs that are identical except for one parameter. A single job can also be called a task in this context.

Managing many Tasks in a Single Job

The first approach to run all the required tasks of the example is a single job script. In the job script, we can use a loop to run all the tasks:

#!/bin/bash

### SLURM options (others can be added, e.g. for memory)
#SBATCH --partition carl.p

# loop for tasks (reads parameter.dat line by line into variable p
cat parameter.dat | while read p
do
  echo -n "Testing if $p is prime? "
  ./isprime $p
done

This approach has the disadvantage that only one job is running on the cluster and the tasks are executed in a serial manner. However, in case the indivdual tasks are very short (less than a few minutes maybe) and the number of tasks is not too large (less than 100), this approach might be useful.

Managing many Tasks in a Job Array