How to Use Job Dependencies

From HPC users
Jump to navigationJump to search

The Basics

SLURM offers the possibility to submit jobs depending on already submitted jobs. This functionality can be useful in some situations as illustrated in the examples below. Before, let look at the basics which are also explained in the man-pages of sbatch. Suppose, we have the script print.sh which reads

#!/bin/bash

echo "$1" >> terminal.txt
sleep 120

and we would like to run it in two jobs on the cluster, but the second job should wait for the first one to terminate. This can be achieved using a job dependency as in the following example:

[abcd1234@carl]$ sbatch -p carl.p print.sh "Hello"
Submitted batch job 19520727
[abcd1234@carl]$ sbatch -p carl.p --depend afterok:19520727 print.sh "World"
Submitted batch job 19520729

Here, the second job depends on the first on, which is identified by its job id. The --depend- or -d-option can be followed by one or more dependencies in comma-separated list. Each dependency has the form <type>:<jobid>[:jobid...]

[lees4820@hpcl002 Dependency]$ squeue -u $USER

            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
         19520727    carl.p print.sh lees4820 CF       0:02      1 mpcs030
         19520729    carl.p print.sh lees4820 PD       0:00      1 (Dependency)
         19520725    carl.p print.sh lees4820  R       0:34      1 mpcs085

[lees4820@hpcl002 Dependency]$ squeue -u $USER

            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
         19520729    carl.p print.sh lees4820 PD       0:00      1 (Dependency)
         19520727    carl.p print.sh lees4820  R       0:18      1 mpcs030
         19520725    carl.p print.sh lees4820  R       0:50      1 mpcs085

[lees4820@hpcl002 Dependency]$ squeue -u $USER

            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
         19520729    carl.p print.sh lees4820 PD       0:00      1 (Dependency)
         19520727    carl.p print.sh lees4820  R       0:33      1 mpcs030
         19520725    carl.p print.sh lees4820  R       1:05      1 mpcs085

[lees4820@hpcl002 Dependency]$ cat terminal.txt

Hello [lees4820@hpcl002 Dependency]$ squeue -u $USER

            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
         19520729    carl.p print.sh lees4820 PD       0:00      1 (Dependency)
         19520727    carl.p print.sh lees4820  R       1:38      1 mpcs030

[lees4820@hpcl002 Dependency]$ squeue -u $USER

            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
         19520729    carl.p print.sh lees4820  R       1:54      1 mpcl109

[lees4820@hpcl002 Dependency]$ cat terminal.txt

Hello World [lees4820@hpcl002 Dependency]$

Examples

Seperate Jobs to Build a Workflow

Job Chains

Dependencies and Job Arrays