Difference between revisions of "How to Use Job Dependencies"

From HPC users
Jump to navigationJump to search
 
(One intermediate revision by the same user not shown)
Line 15: Line 15:
Submitted batch job 19520729
Submitted batch job 19520729
</pre>
</pre>
Here, the second job depends on the first on, which is identified by its job id. The <tt>--depend</tt>- or <tt>-d</tt>-option can be followed by one or more dependencies in comma-separated list. Each dependency has the form <tt><type>:<jobid>[:jobid...]</tt>
Here, the second job depends on the first on, which is identified by its jobid. The <tt>--depend</tt>- or <tt>-d</tt>-option can be followed by one or more dependencies in comma-separated list. Each dependency has the form <tt><type>:<jobid>[:jobid...]</tt>. After the two jobs are submitted, you can use <tt>squeue</tt> to see the job status:
 
<pre>
[lees4820@hpcl002 Dependency]$ squeue -u $USER
[abcd1234@carl]$ squeue -u $USER
            JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
          19520727    carl.p print.sh lees4820 CF      0:02      1 mpcs030
          19520729    carl.p print.sh lees4820 PD      0:00      1 (Dependency)
          19520725    carl.p print.sh lees4820  R      0:34      1 mpcs085
[lees4820@hpcl002 Dependency]$ squeue -u $USER
             JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
             JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
           19520729    carl.p print.sh lees4820 PD      0:00      1 (Dependency)
           19520729    carl.p print.sh lees4820 PD      0:00      1 (Dependency)
           19520727    carl.p print.sh lees4820  R      0:18      1 mpcs030
           19520727    carl.p print.sh lees4820  R      1:18      1 mpcs030
          19520725    carl.p print.sh lees4820  R      0:50      1 mpcs085
[abcd1234@carl]$ squeue -u $USER
[lees4820@hpcl002 Dependency]$ squeue -u $USER
             JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
             JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
           19520729    carl.p print.sh lees4820 PD       0:00     1 (Dependency)
           19520729    carl.p print.sh lees4820 R       0:14     1 mpcl109
          19520727    carl.p print.sh lees4820  R      0:33      1 mpcs030
</pre>
          19520725    carl.p print.sh lees4820  R      1:05      1 mpcs085
Initially, the first job is running while the second one is pending with the reason <tt>(Dependency)</tt>. Once the first job is completed, the second one starts and after it has also completed we can check the final result:
[lees4820@hpcl002 Dependency]$ cat terminal.txt  
<pre>
[abcd1234@carl]$ cat terminal.txt  
Hello  
Hello  
[lees4820@hpcl002 Dependency]$ squeue -u $USER
World
</pre>
 
There are different types of dependencies, which can be used. The most important ones are listed here:
# '''afterok''': the dependent job will start after the previous job has terminated with a zero exit status (success, no error), this is the normal type.
# '''afternotok''': the dependent job will start after the previous job has terminated with a non-zero exit status (no success, error), this can be useful to implement some error checking or handling.
# ''afterany''': the dependent job will start after the previous job has terminated, regardless of the exit status.
# '''after''': the dependent job will start after the previous job has started execution, which is a rather unusual application.
# '''singleton''': the dependency is based on the job name and user, only one job is running at any given time.
 
In order to automatize the submission of dependent jobs, you can use the fact that <tt>sbatch</tt> returns the jobid. The example above could be generalized in the following way:
<pre>
[abcd1234@carl]$ jid=$(sbatch -p carl.p --parsable print.sh "Hello")
[abcd1234@carl]$ jid=$(sbatch -p carl.p --depend afterok:$jid --parsable print.sh "World")
[abcd1234@carl]$ jid=$(sbatch -p carl.p --depend afterok:$jid --parsable print.sh "what is")
[abcd1234@carl]$ jid=$(sbatch -p carl.p --depend afterok:$jid --parsable print.sh "going on?")
[abcd1234@carl]$ squeue -u $USER
             JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
             JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
           19520729   carl.p print.sh lees4820 PD      0:00      1 (Dependency)
           19520764   carl.p print.sh lees4820 PD      0:00      1 (Dependency)
           19520727   carl.p print.sh lees4820  R      1:38     1 mpcs030
           19520766    carl.p print.sh lees4820 PD      0:00      1 (Dependency)
[lees4820@hpcl002 Dependency]$ squeue -u $USER
          19520767    carl.p print.sh lees4820 PD      0:00      1 (Dependency)
          19520763   carl.p print.sh lees4820  R      0:50     1 mpcs013
[abcd1234@carl]$ squeue -u $USER
             JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
             JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
           19520729   carl.p print.sh lees4820  R      1:54     1 mpcl109
           19520766    carl.p print.sh lees4820 PD      0:00      1 (Dependency)
[lees4820@hpcl002 Dependency]$ cat terminal.txt  
          19520767    carl.p print.sh lees4820 PD      0:00      1 (Dependency)
          19520764   carl.p print.sh lees4820  R      0:25     1 mpcs030
[abcd1234@carl]$ cat terminal.txt # after all jobs are done
Hello  
Hello  
World  
World  
[lees4820@hpcl002 Dependency]$
what is
going on?
</pre>


== Examples ==
== Examples ==

Latest revision as of 16:15, 1 October 2019

The Basics

SLURM offers the possibility to submit jobs depending on already submitted jobs. This functionality can be useful in some situations as illustrated in the examples below. Before, let look at the basics which are also explained in the man-pages of sbatch. Suppose, we have the script print.sh which reads

#!/bin/bash

echo "$1" >> terminal.txt
sleep 120

and we would like to run it in two jobs on the cluster, but the second job should wait for the first one to terminate. This can be achieved using a job dependency as in the following example:

[abcd1234@carl]$ sbatch -p carl.p print.sh "Hello"
Submitted batch job 19520727
[abcd1234@carl]$ sbatch -p carl.p --depend afterok:19520727 print.sh "World"
Submitted batch job 19520729

Here, the second job depends on the first on, which is identified by its jobid. The --depend- or -d-option can be followed by one or more dependencies in comma-separated list. Each dependency has the form <type>:<jobid>[:jobid...]. After the two jobs are submitted, you can use squeue to see the job status:

[abcd1234@carl]$ squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          19520729    carl.p print.sh lees4820 PD       0:00      1 (Dependency)
          19520727    carl.p print.sh lees4820  R       1:18      1 mpcs030
[abcd1234@carl]$ squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          19520729    carl.p print.sh lees4820  R       0:14      1 mpcl109

Initially, the first job is running while the second one is pending with the reason (Dependency). Once the first job is completed, the second one starts and after it has also completed we can check the final result:

[abcd1234@carl]$ cat terminal.txt 
Hello 
World

There are different types of dependencies, which can be used. The most important ones are listed here:

  1. afterok: the dependent job will start after the previous job has terminated with a zero exit status (success, no error), this is the normal type.
  2. afternotok: the dependent job will start after the previous job has terminated with a non-zero exit status (no success, error), this can be useful to implement some error checking or handling.
  3. afterany': the dependent job will start after the previous job has terminated, regardless of the exit status.
  4. after: the dependent job will start after the previous job has started execution, which is a rather unusual application.
  5. singleton: the dependency is based on the job name and user, only one job is running at any given time.

In order to automatize the submission of dependent jobs, you can use the fact that sbatch returns the jobid. The example above could be generalized in the following way:

[abcd1234@carl]$ jid=$(sbatch -p carl.p --parsable print.sh "Hello")
[abcd1234@carl]$ jid=$(sbatch -p carl.p --depend afterok:$jid --parsable print.sh "World")
[abcd1234@carl]$ jid=$(sbatch -p carl.p --depend afterok:$jid --parsable print.sh "what is")
[abcd1234@carl]$ jid=$(sbatch -p carl.p --depend afterok:$jid --parsable print.sh "going on?")
[abcd1234@carl]$ squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          19520764    carl.p print.sh lees4820 PD       0:00      1 (Dependency)
          19520766    carl.p print.sh lees4820 PD       0:00      1 (Dependency)
          19520767    carl.p print.sh lees4820 PD       0:00      1 (Dependency)
          19520763    carl.p print.sh lees4820  R       0:50      1 mpcs013
[abcd1234@carl]$ squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          19520766    carl.p print.sh lees4820 PD       0:00      1 (Dependency)
          19520767    carl.p print.sh lees4820 PD       0:00      1 (Dependency)
          19520764    carl.p print.sh lees4820  R       0:25      1 mpcs030
[abcd1234@carl]$ cat terminal.txt # after all jobs are done
Hello 
World 
what is 
going on? 

Examples

Seperate Jobs to Build a Workflow

Job Chains

Dependencies and Job Arrays