Difference between revisions of "How to Use Job Dependencies"
(Created page with "== The Basics == == Examples == === Seperate Jobs to Build a Workflow === === Job Chains === === Dependencies and Job Arrays ===") |
|||
Line 1: | Line 1: | ||
== The Basics == | == The Basics == | ||
SLURM offers the possibility to submit jobs depending on already submitted jobs. This functionality can be useful in some situations as illustrated in the examples below. Before, let look at the basics which are also explained in the [https://slurm.schedmd.com/sbatch.html <tt>man</tt>-pages of <tt>sbatch</tt>]. Suppose, we have the script <tt>print.sh</tt> which reads | |||
<pre> | |||
#!/bin/bash | |||
echo "$1" >> terminal.txt | |||
sleep 120 | |||
</pre> | |||
and we would like to run it in two jobs on the cluster, but the second job should wait for the first one to terminate. This can be achieved using a job dependency as in the following example: | |||
<pre> | |||
[abcd1234@carl]$ sbatch -p carl.p print.sh "Hello" | |||
Submitted batch job 19520727 | |||
[abcd1234@carl]$ sbatch -p carl.p --depend afterok:19520727 print.sh "World" | |||
Submitted batch job 19520729 | |||
</pre> | |||
Here, the second job depends on the first on, which is identified by its job id. The <tt>--depend</tt>- or <tt>-d</tt>-option can be followed by one or more dependencies in comma-separated list. Each dependency has the form <tt><type>:<jobid>[:jobid...]</tt> | |||
[lees4820@hpcl002 Dependency]$ squeue -u $USER | |||
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | |||
19520727 carl.p print.sh lees4820 CF 0:02 1 mpcs030 | |||
19520729 carl.p print.sh lees4820 PD 0:00 1 (Dependency) | |||
19520725 carl.p print.sh lees4820 R 0:34 1 mpcs085 | |||
[lees4820@hpcl002 Dependency]$ squeue -u $USER | |||
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | |||
19520729 carl.p print.sh lees4820 PD 0:00 1 (Dependency) | |||
19520727 carl.p print.sh lees4820 R 0:18 1 mpcs030 | |||
19520725 carl.p print.sh lees4820 R 0:50 1 mpcs085 | |||
[lees4820@hpcl002 Dependency]$ squeue -u $USER | |||
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | |||
19520729 carl.p print.sh lees4820 PD 0:00 1 (Dependency) | |||
19520727 carl.p print.sh lees4820 R 0:33 1 mpcs030 | |||
19520725 carl.p print.sh lees4820 R 1:05 1 mpcs085 | |||
[lees4820@hpcl002 Dependency]$ cat terminal.txt | |||
Hello | |||
[lees4820@hpcl002 Dependency]$ squeue -u $USER | |||
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | |||
19520729 carl.p print.sh lees4820 PD 0:00 1 (Dependency) | |||
19520727 carl.p print.sh lees4820 R 1:38 1 mpcs030 | |||
[lees4820@hpcl002 Dependency]$ squeue -u $USER | |||
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | |||
19520729 carl.p print.sh lees4820 R 1:54 1 mpcl109 | |||
[lees4820@hpcl002 Dependency]$ cat terminal.txt | |||
Hello | |||
World | |||
[lees4820@hpcl002 Dependency]$ | |||
== Examples == | == Examples == |
Revision as of 15:47, 1 October 2019
The Basics
SLURM offers the possibility to submit jobs depending on already submitted jobs. This functionality can be useful in some situations as illustrated in the examples below. Before, let look at the basics which are also explained in the man-pages of sbatch. Suppose, we have the script print.sh which reads
#!/bin/bash echo "$1" >> terminal.txt sleep 120
and we would like to run it in two jobs on the cluster, but the second job should wait for the first one to terminate. This can be achieved using a job dependency as in the following example:
[abcd1234@carl]$ sbatch -p carl.p print.sh "Hello" Submitted batch job 19520727 [abcd1234@carl]$ sbatch -p carl.p --depend afterok:19520727 print.sh "World" Submitted batch job 19520729
Here, the second job depends on the first on, which is identified by its job id. The --depend- or -d-option can be followed by one or more dependencies in comma-separated list. Each dependency has the form <type>:<jobid>[:jobid...]
[lees4820@hpcl002 Dependency]$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 19520727 carl.p print.sh lees4820 CF 0:02 1 mpcs030 19520729 carl.p print.sh lees4820 PD 0:00 1 (Dependency) 19520725 carl.p print.sh lees4820 R 0:34 1 mpcs085
[lees4820@hpcl002 Dependency]$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 19520729 carl.p print.sh lees4820 PD 0:00 1 (Dependency) 19520727 carl.p print.sh lees4820 R 0:18 1 mpcs030 19520725 carl.p print.sh lees4820 R 0:50 1 mpcs085
[lees4820@hpcl002 Dependency]$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 19520729 carl.p print.sh lees4820 PD 0:00 1 (Dependency) 19520727 carl.p print.sh lees4820 R 0:33 1 mpcs030 19520725 carl.p print.sh lees4820 R 1:05 1 mpcs085
[lees4820@hpcl002 Dependency]$ cat terminal.txt
Hello [lees4820@hpcl002 Dependency]$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 19520729 carl.p print.sh lees4820 PD 0:00 1 (Dependency) 19520727 carl.p print.sh lees4820 R 1:38 1 mpcs030
[lees4820@hpcl002 Dependency]$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 19520729 carl.p print.sh lees4820 R 1:54 1 mpcl109
[lees4820@hpcl002 Dependency]$ cat terminal.txt
Hello World [lees4820@hpcl002 Dependency]$