Difference between revisions of "SGE Job Management (Queueing) System"
Albensoeder (talk | contribs) |
Albensoeder (talk | contribs) |
||
(42 intermediate revisions by 2 users not shown) | |||
Line 35: | Line 35: | ||
The default runtime of a job is 0:0:0. Thus you should always specify a runtime, unless it is a very short job. | The default runtime of a job is 0:0:0. Thus you should always specify a runtime, unless it is a very short job. | ||
It is highly recommended to specify the runtime of your job as realistically as possible (leaving, of course, a margin of error). If the scheduler knows that, e.g., a pending job is a "fast run" which needs only a few hours of walltime, it is likely that it will start executing much earlier than other jobs with more extensive walltime requirements (so-called '''backfilling'''). | |||
=== Memory === | === Memory === | ||
Line 59: | Line 54: | ||
<li>low-memory nodes of FLOW (<tt>cfdl001..cfdl122</tt>): 22 GB | <li>low-memory nodes of FLOW (<tt>cfdl001..cfdl122</tt>): 22 GB | ||
<li>high-memory nodes of FLOW (<tt>cfdh001..cfdh064</tt>): 46 GB | <li>high-memory nodes of FLOW (<tt>cfdh001..cfdh064</tt>): 46 GB | ||
<li>Ivy-bridge nodes of FLOW (<tt>cfdi001..</tt>): 62 GB | |||
<li>Nehalem nodes of FLOW (<tt>cfdx001..</tt>): 30 GB | |||
</ul> | </ul> | ||
=== Limitation of written blocks/ requirements on disk space === | |||
The SGE option | |||
<pre> | <pre> | ||
#$ -l | #$ -l h_fsize=200G | ||
</pre> | </pre> | ||
specified limit of the size of all written blocks (approximately size of all written files) of a job. The default value is <tt>h_fsize=10G</tt>. | |||
At the same time the value specify the requirement of needed disk space of a job. On the FLOW low and high memory nodes there is no limitaton per node to accept a job on a node. | |||
On nodes with local disk storage (HERO nodes, express nodes on FLOW) the value is limited to the local disk size of a node. | |||
The path to the local scratch directory can be accessed in your job script (or other scripts/programs invoked by your job) via the <tt>$TMPDIR</tt> environment variable. After termination of your job (or if you kill your job manually by <tt>qdel</tt>), the scratch directory is automatically purged. | The path to the local scratch directory can be accessed in your job script (or other scripts/programs invoked by your job) via the <tt>$TMPDIR</tt> environment variable. After termination of your job (or if you kill your job manually by <tt>qdel</tt>), the scratch directory is automatically purged. | ||
Total amount of scratch space available on each compute node: | Total amount of scratch space available on each compute node: | ||
<ul> | <ul> | ||
<li> standard nodes (<tt>mps001..mpcs130</tt>): 800 GB | <li> FLOW express nodes (<tt>cfdx001..cfdx007</tt>): 130 GB | ||
<li> big nodes (<tt>mpcb001..mpcb020</tt>): 2100 GB | <li> HERO standard nodes (<tt>mps001..mpcs130</tt>): 800 GB | ||
<li> HERO big nodes (<tt>mpcb001..mpcb020</tt>): 2100 GB | |||
</ul> | </ul> | ||
If your job needs more than 800 GB of scratch space, you must request one of the big nodes. Example: | If your job needs more than 800 GB of scratch space on HERO, you must request one of the big nodes. Example: | ||
<pre> | <pre> | ||
#$ -l h_fsize=1400G | #$ -l h_fsize=1400G | ||
</pre> | </pre> | ||
=== Output === | === Output === | ||
Line 122: | Line 109: | ||
=== Parallel environments (PEs) === | === Parallel environments (PEs) === | ||
'''Example''': If you have an MPI program compiled and linked with the Intel Compiler and MPI library, | '''Example''': If you have an MPI program compiled and linked with the Intel Compiler and MPI library 4.1, | ||
your job submission script might look like follows: | your job submission script might look like follows: | ||
<pre> | <pre> | ||
#$ -pe | #$ -pe impi 96 | ||
#$ -R y | #$ -R y | ||
load module | load module impi | ||
mpirun -np $NSLOTS ./myprog_intelmpi | |||
</pre> | </pre> | ||
Line 137: | Line 124: | ||
Please have a look at the directory named <tt>Examples</tt> in your homedirectory, which contains other examples how to submit parallel (MPI) jobs. | Please have a look at the directory named <tt>Examples</tt> in your homedirectory, which contains other examples how to submit parallel (MPI) jobs. | ||
Current parallel environments are listed below | |||
<center> | |||
{| style="background-color:#eeeeff;" cellpadding="10" border="1" cellspacing="0" | |||
|- style="background-color:#ddddff;" | |||
! Name | |||
!Description | |||
|- | |||
| ''impi'' | |||
| Intel MPI | |||
|- | |||
| ''mpich'' | |||
| Mpich | |||
|- | |||
| ''mpich2'' | |||
| Mpich 2 | |||
|- | |||
| ''openmpi'' | |||
| OpenMPI | |||
|- | |||
| ''smp'' | |||
| Shared memory parallized programs | |||
|- | |||
| ''ansys'' | |||
| ANSYS CFX | |||
|- | |||
| ''linda'' | |||
| Linda | |||
|- | |||
| ''mdsc'' | |||
| Matlab Distributed Computing Server | |||
|- | |||
| ''molcas'' | |||
| MOLCAS | |||
|- | |||
| ''starccmp'' | |||
| StarCCM+ | |||
|- | |||
|} | |||
</center> | |||
=== Job dependencies === | |||
There exist the possibility to set job dependencies within SGE, e.g. job ''B'' should start after job ''A'' has been finished. This can be realized by adding the option ''-hold_jid''. In the example the job script of job ''A'' looks like | |||
#!/bin/bash | |||
... | |||
#$ -N jobA | |||
... | |||
Then the job script of job ''B'' should contain | |||
#!/bin/bash | |||
... | |||
#$ -hold_jid jobA | |||
... | |||
This ensures the job ''B'' waits on job ''A''. | |||
. | '''Note:''' It is important that the job ''A'' is submitted before job ''B''. Otherwise SGE thinks that job ''A'' already has been finished! | ||
=== Array jobs === | === Array jobs === | ||
Array jobs are a very efficient way of managing your jobs under certain circumstances (e.g., if you have to run one identical program many times on different data sets, with different initial conditions, etc.). The main key is that the job ist started ''N''-times where each job gets an task ID by the environment variable ''SGE_TASK_ID''. This allows for example to run a program on several files with one job. | |||
For example the program ''a.out'' should act on the files ''input.1'', ''input.2'',..., ''input.10'' and you want to store the ouput of ''a.out'' in files ''stdout.1'', ''stdout.2'',..., ''stdout.10'' then you can simply do it by the following SGE-script | |||
#$ -t 1-10 | |||
#$ -cwd | |||
./a.out input.$SGE_TASK_ID > stdout.$SGE_TASK_ID | |||
By submitting the script the job will be started 10 times under one job ID but with 10 task IDs. By adding the option | |||
#$ -tc 2 | |||
you can limit the number of executed jobs at the same time to 2. | |||
For further information please see the corresponding [http://docs.oracle.com/cd/E24901_01/doc.62/e21976/chapter2.htm#BGBHJCIH Section] in the official documentation of Sun Grid Engine. | |||
'''Note for FLOW users:''' On FLOW for each task a complete node is used for one job by default. So for serial (non-parallel) array jobs this would be a waste of resources. To disable the exclusive usage please add following line to your script | |||
#$ -l excl_flow=false | |||
=== Overview of available options === | === Overview of available options === | ||
Line 172: | Line 221: | ||
! colspan="2"|Available on | ! colspan="2"|Available on | ||
|- style="background-color:#ddddff;" | |- style="background-color:#ddddff;" | ||
| #$ ... | | ''#$ ...'' or ''qsub ...'' | ||
| | | | ||
| | | | ||
Line 178: | Line 227: | ||
| align="center"|HERO | | align="center"|HERO | ||
|- | |- | ||
| ''- | | ''-hold_jid'' | ||
| ''- | | ''-hold_jid JobName'' | ||
| | | Starts the job until the given job name has been finsihed. | ||
| align="center"| X | | align="center"| X | ||
| align="center"| X | | align="center"| X | ||
|- | |- | ||
| ''-l | | ''-l h_rt='' | ||
| | | ''-l h_rt=05:30:00'' | ||
| | | Set the runtime limit of the job, e.g. 5h 30min. This option has to be set! | ||
| align="center"| X | | align="center"| X | ||
| align="center"| X | | align="center"| X | ||
|- | |- | ||
| ''-l h_vmem='' | | ''-l h_vmem='' | ||
| ''-l h_vmem=1800M'' | | ''-l h_vmem=1800M'' | ||
| Maximum memory (physical + virtual) usage of a job per slot (core), e.g. 1800Mb per core | | Maximum memory (physical + virtual) usage of a job per slot (core), e.g. 1800Mb per core (default value 1200Mb) | ||
| align="center"| X | | align="center"| X | ||
| align="center"| X | | align="center"| X | ||
|- | |- | ||
| ''-l | | ''-l excl_flow=false'' | ||
| | | | ||
| | | Disable the exclusive node reservation for jobs. Use this flag only for serial (non-parallel) jobs. | ||
| align="center"| X | | align="center"| X | ||
| align="center"| | | align="center"| | ||
Line 210: | Line 254: | ||
| ''-l h_fsize='' | | ''-l h_fsize='' | ||
| ''-l h_fsize=200G'' | | ''-l h_fsize=200G'' | ||
| Specified the | | Specified the total number of disk blocks that a job writes to disk. (default value 10Gb) | ||
| align="center"| | | align="center"| X | ||
| align="center"| X | | align="center"| X | ||
|- | |- | ||
Line 265: | Line 309: | ||
| ''-p -100'' | | ''-p -100'' | ||
| Redefine the priority of the job (default 0, one can only lower the priority), e.g. priority is lower to -100. | | Redefine the priority of the job (default 0, one can only lower the priority), e.g. priority is lower to -100. | ||
| align="center"| X | |||
| align="center"| X | |||
|- | |||
| ''-P'' | |||
| ''-P projectname'' | |||
| Defines a project name for the job to account jobs to a specific project. This option should usally not be used. The project name has do be approved by the SGE-admins and the user must belong to the project otherwise the job will be rejected. | |||
| align="center"| X | | align="center"| X | ||
| align="center"| X | | align="center"| X | ||
Line 275: | Line 325: | ||
|- | |- | ||
| ''-tc'' | | ''-tc'' | ||
| ''- | | ''-tc 2'' | ||
| Defines a the maximal number of jobs running at the same time in job array, e.g. 2 jobs. | | Defines a the maximal number of jobs running at the same time in job array, e.g. 2 jobs. | ||
| align="center"| X | | align="center"| X | ||
Line 282: | Line 332: | ||
|} | |} | ||
</center> | </center> | ||
== Interactive jobs == | == Interactive jobs == | ||
Line 289: | Line 338: | ||
Users who are entitled to submit interactive jobs type | Users who are entitled to submit interactive jobs type | ||
qlogin | qlogin | ||
on the command line | on the command line. After that, a graphical Matlab session can be started by issuing the | ||
following two commands: | following two commands: | ||
module load matlab | module load matlab | ||
Line 296: | Line 345: | ||
(Sending the Matlab process to the background gives you control over the shell, which may be useful. | (Sending the Matlab process to the background gives you control over the shell, which may be useful. | ||
If you do not specify any memory requirements, your interactive job will be limited to using at most 500MB. If you need more (e.g., 2 GB), you have to request the memory explicitly, as in: | If you do not specify any memory requirements, your interactive job will be limited to using at most 500MB. If you need more (e.g., 2 GB), you have to request the memory explicitly, as in: | ||
qlogin | qlogin -l h_vmem=2G | ||
Note that the syntax is the same as for requesting resource requirements in job submission script (a resource request starts with the "-l" flag). | Note that the syntax is the same as for requesting resource requirements in job submission script (a resource request starts with the "-l" flag). | ||
== Monitoring and managing your jobs == | == Monitoring and managing your jobs == | ||
Line 313: | Line 361: | ||
*<tt>qhost</tt>: display state of all hosts. | *<tt>qhost</tt>: display state of all hosts. | ||
*<tt>qfreenodes</tt>: display the number of free nodes and free cores. | |||
Note that there is also a GUI to SGE, invoked by the command <tt>qmon</tt> | Note that there is also a GUI to SGE, invoked by the command <tt>qmon</tt> | ||
Latest revision as of 17:26, 22 January 2015
The queueing system employed to manage user jobs for FLOW and HERO is Sun Grid Engine (SGE). For first-time users (especially those acquainted with PBS-based systems), some features of SGE may seem a little unusual and certainly need some getting-accustomed-to. In order to efficiently use the available hardware resources (so that all users may benefit the most from the system), a basic understanding of how SGE works is indispensable. Some of the points to keep in mind are the following:
- Unlike other (e.g., PBS-based) queueing systems, SGE does not "know" the concept of "nodes" with a fixed number of CPUs (cores) and users specifying the number of nodes they need, along with the number of CPUs per node, in their job requirements. Instead, SGE logically divides the cluster into slots, where, roughly speaking, each "slot" may be thought of as a single CPU core (although there are notable exceptions to this rule, see the parallel environment linda below. The scheduler assigns free slots to pending jobs. Since in the multi-core area each host offers many slots, this will, in general, lead to jobs of different users running concurrently on the same host (provided that there are sufficient resources like memory, disk space etc. to meet all requirements of all jobs, as specified by the users who submitted them) and usually guarantees efficient resource utilization.
- While the scheduling behavior described above may be very efficient in optimally using the available hardware resources, it will have undesirable effects on parallel (MPI, LINDA, ...) jobs. E.g., an MPI job requesting 24 slots could end up running 3 tasks on one host, 12 tasks on another host (fully occupying this host, if it is a server with 2 six-core CPUs, as happens with our clusters), and 9 tasks on a third host. Clearly, such an unbalanced configuration may lead to problems. For certain jobs (multithreaded applications) it is even mandatory that all slots reside on one host (typical examples: OpenMP programs, Gaussian single-node jobs).
To deal with the specific demands of parallel jobs, SGE offers so-called parallel environments (PEs) which are largely configurable. Even if your job does not need several hosts, but runs on only one host using several or all cores of that host, you must specify a parallel environment. It is of crucial importance to choose the "correct" parallel environment (meeting the requirements of your application/program) when submitting a parallel job.
- Another "peculiarity" of SGE (as compared to its cousins) are the concepts of cluster queues and queue instances. Cluster queues are composed of several (typically, many) queue instances, with each instance associated with one particular host. A cluster queue may have a name like, e.g., standardqueue.q, where the .q suffix is a commonly followed convention. Then the queue instances of this queue has names like, e.g. standardqueue.q@host001, standardqueue.q@host002, ... (note the "@" which acts as a delimiter between the queue name and the queue instance).
In general, each host will hold several queue instances belonging to different cluster queues. E.g. there may be a special queue for long-running jobs and a queue for shorter jobs, both of which share the same "physical" machines but have different policies. To avoid oversubscription, resource limits can be configure for individual hosts. Since resource limits and other, more complex attributes can also be associated with cluster queues and even queue instances, the system is highly flexible and can be customized for specified needs. On the other hand, the configuration quickly tends to get rather complex, leading to unexpected side effects. E.g., PEs grab slots from all queue instances of all cluster queues they are associated with. Thus, a parallel job may occupy slots on one particular host belonging to different queue instances on that host. While this is usually no problem for the parallel job itself, it blocks resources in both cluster queues which may be unintended. For that reason, it is common practice to associate each PE with one and only one cluster queue and define several (possibly identically configured) PEs in order to avoid that a single PE spans several cluster queues.
Submitting jobs
Sample job submission scripts for both serial and parallel jobs are provided in the subdirectory Examples of your homedirectory. You may have to adapt these scripts as needed. Note that a job submission script consists of two principal parts:
- SGE directives (lines starting with the "magic" characters #$), which fall into three categories:
- general options (which shell to use, name of the job, name of output and error files if differing from default, etc.). The directives are passed to the qsub command when the job is submitted.
- Resource requirements (introduced by the -l flag), like memory, disk space, runtime (wallclock) limit, etc.
- Options for parallel jobs (parallel environment, number of job slots, etc.)
- Commands to be executed by the job (your program, script, etc.), including the necessary set-up of the environment for the application/program to run correctly (loading of modules so that your programs find the required runtime libraries, etc.).
The job is submitted by the qsub command, e.g. (assuming your submission script is named"myprog.sge):
qsub myprog.sge
Specifying job requirements
The general philosophy behind SGE is that you should not submit your job to a specific queue or queue instance (although this is possible in principle), but rather define your requirements, and then let SGE decide which queue matches them best (taking into account the current load of the system and other factors). For this "automatic" queue selection to work efficiently and in order to avoid wasting of valuable resources (e.g., requesting much more memory than your job needs, which may prevent the scheduling of jobs of other users), it is important that you give a complete and precise specification of your job requirements in your submission script. The following points are relevant to both serial and parallel jobs.
Runtime
Maximum (wallclock) runtime is specified by h_rt=<hh:mm:ss>. E.g., a maximum runtime of three days is requested by:
#$ -l h_rt=72:0:0
The default runtime of a job is 0:0:0. Thus you should always specify a runtime, unless it is a very short job.
It is highly recommended to specify the runtime of your job as realistically as possible (leaving, of course, a margin of error). If the scheduler knows that, e.g., a pending job is a "fast run" which needs only a few hours of walltime, it is likely that it will start executing much earlier than other jobs with more extensive walltime requirements (so-called backfilling).
Memory
Maximum memory (physical + virtual) usage of a job is defined by the h_vmem attribute, as in
#$ -l h_vmem=4G
for a job requesting 4 GB of total memory. If your job exceeds the specified memory limit, it gets killed automatically. The default value for h_vmem is 500 MB.
Important: The h_vmem attribute refers to the memory per job slot, i.e. it gets multiplied by the number of slots for a parallel job.
Total memory available for jobs on each compute node:
- standard compute nodes of HERO (mpcs001..mpcs130): 23 GB
- big nodes of HERO (mpcb001..mpcb020): 46 GB
- low-memory nodes of FLOW (cfdl001..cfdl122): 22 GB
- high-memory nodes of FLOW (cfdh001..cfdh064): 46 GB
- Ivy-bridge nodes of FLOW (cfdi001..): 62 GB
- Nehalem nodes of FLOW (cfdx001..): 30 GB
Limitation of written blocks/ requirements on disk space
The SGE option
#$ -l h_fsize=200G
specified limit of the size of all written blocks (approximately size of all written files) of a job. The default value is h_fsize=10G.
At the same time the value specify the requirement of needed disk space of a job. On the FLOW low and high memory nodes there is no limitaton per node to accept a job on a node.
On nodes with local disk storage (HERO nodes, express nodes on FLOW) the value is limited to the local disk size of a node. The path to the local scratch directory can be accessed in your job script (or other scripts/programs invoked by your job) via the $TMPDIR environment variable. After termination of your job (or if you kill your job manually by qdel), the scratch directory is automatically purged.
Total amount of scratch space available on each compute node:
- FLOW express nodes (cfdx001..cfdx007): 130 GB
- HERO standard nodes (mps001..mpcs130): 800 GB
- HERO big nodes (mpcb001..mpcb020): 2100 GB
If your job needs more than 800 GB of scratch space on HERO, you must request one of the big nodes. Example:
#$ -l h_fsize=1400G
Output
As default the output of a jobs stdout and stderr is piped into two output files called
<JOBNAME>.o<JOBID>
and
<JOBNAME>.e<JOBID>
The job name is defined by the line
#$ -N <JOBNAME>
in the SGE job script. If this option is not defined JOBNAME is set to the SGE job script name. The job id will be given by SGE when the job is submitted. For a simpler tracking of errors the output of stdout and stderr can be piped in only one file
<JOBNAME>.o<JOBID>
by adding the option
#$ -j y
to the SGE job script.
Parallel environments (PEs)
Example: If you have an MPI program compiled and linked with the Intel Compiler and MPI library 4.1, your job submission script might look like follows:
#$ -pe impi 96 #$ -R y load module impi mpirun -np $NSLOTS ./myprog_intelmpi
In that case, the MPI job uses the InfiniBand fabric for communication (the I_MPI_FABRICS variable). Turning on resource reservation (-R y) is highly recommended in order to avoid starving of parallel jobs by serial jobs which "block" required slots on specific hosts.The job requests 96 cores. The allocation rule of this PE is "fill-up", i.e. SGE tries to place the MPI tasks on as few hosts as possible (in the "ideal" case, the program would run on exactly 8 hosts (with cores or slots on each host, but there is no guerantee that this is going to happen).
Please have a look at the directory named Examples in your homedirectory, which contains other examples how to submit parallel (MPI) jobs.
Current parallel environments are listed below
Name | Description |
---|---|
impi | Intel MPI |
mpich | Mpich |
mpich2 | Mpich 2 |
openmpi | OpenMPI |
smp | Shared memory parallized programs |
ansys | ANSYS CFX |
linda | Linda |
mdsc | Matlab Distributed Computing Server |
molcas | MOLCAS |
starccmp | StarCCM+ |
Job dependencies
There exist the possibility to set job dependencies within SGE, e.g. job B should start after job A has been finished. This can be realized by adding the option -hold_jid. In the example the job script of job A looks like
#!/bin/bash ... #$ -N jobA ...
Then the job script of job B should contain
#!/bin/bash ... #$ -hold_jid jobA ...
This ensures the job B waits on job A.
Note: It is important that the job A is submitted before job B. Otherwise SGE thinks that job A already has been finished!
Array jobs
Array jobs are a very efficient way of managing your jobs under certain circumstances (e.g., if you have to run one identical program many times on different data sets, with different initial conditions, etc.). The main key is that the job ist started N-times where each job gets an task ID by the environment variable SGE_TASK_ID. This allows for example to run a program on several files with one job.
For example the program a.out should act on the files input.1, input.2,..., input.10 and you want to store the ouput of a.out in files stdout.1, stdout.2,..., stdout.10 then you can simply do it by the following SGE-script
#$ -t 1-10 #$ -cwd ./a.out input.$SGE_TASK_ID > stdout.$SGE_TASK_ID
By submitting the script the job will be started 10 times under one job ID but with 10 task IDs. By adding the option
#$ -tc 2
you can limit the number of executed jobs at the same time to 2.
For further information please see the corresponding Section in the official documentation of Sun Grid Engine.
Note for FLOW users: On FLOW for each task a complete node is used for one job by default. So for serial (non-parallel) array jobs this would be a waste of resources. To disable the exclusive usage please add following line to your script
#$ -l excl_flow=false
Overview of available options
Here a short summary of the SGE job requirement options. More information are available by man qsub or in the documentation.
Option | Example | Description | Available on | |
---|---|---|---|---|
#$ ... or qsub ... | FLOW | HERO | ||
-hold_jid | -hold_jid JobName | Starts the job until the given job name has been finsihed. | X | X |
-l h_rt= | -l h_rt=05:30:00 | Set the runtime limit of the job, e.g. 5h 30min. This option has to be set! | X | X |
-l h_vmem= | -l h_vmem=1800M | Maximum memory (physical + virtual) usage of a job per slot (core), e.g. 1800Mb per core (default value 1200Mb) | X | X |
-l excl_flow=false | Disable the exclusive node reservation for jobs. Use this flag only for serial (non-parallel) jobs. | X | ||
-l h_fsize= | -l h_fsize=200G | Specified the total number of disk blocks that a job writes to disk. (default value 10Gb) | X | X |
-N | -N TESTCASE | Definition of the job name, e.g. TESTCASE | X | X |
-j y | Merges stdout and stderr of a job output. | X | X | |
-pe | -pe impi 48 | Sets the parallel environent, e.g. Intel MPI and 48 cores. | X | X |
-R y | Turning on the resource reservation (highly recommended!). | X | X | |
-S | -S /bin/bash | Selects the shell in which the job starts. | X | X |
-cwd | Starts the job in the directory of the job submission. | X | X | |
-o | -o /home/my/job_stdout | Redefine the name of the stdout file. | X | X |
-e | -e /home/my/job_stderr | Redefine the name of the stderr file. | X | X |
-p | -p -100 | Redefine the priority of the job (default 0, one can only lower the priority), e.g. priority is lower to -100. | X | X |
-P | -P projectname | Defines a project name for the job to account jobs to a specific project. This option should usally not be used. The project name has do be approved by the SGE-admins and the user must belong to the project otherwise the job will be rejected. | X | X |
-t | -t 2-10:2 | Defines a job array,e g. with the ID's 2, 4, 6, 8 and 10 | X | X |
-tc | -tc 2 | Defines a the maximal number of jobs running at the same time in job array, e.g. 2 jobs. | X | X |
Interactive jobs
Interactive jobs are only allowed for members of certain groups from the Institue of Psychology who have special data pre-processing needs which require manual intervention and cannot be automatized (the prerequesite for writing a batch job script).
Users who are entitled to submit interactive jobs type
qlogin
on the command line. After that, a graphical Matlab session can be started by issuing the following two commands:
module load matlab matlab &
(Sending the Matlab process to the background gives you control over the shell, which may be useful. If you do not specify any memory requirements, your interactive job will be limited to using at most 500MB. If you need more (e.g., 2 GB), you have to request the memory explicitly, as in:
qlogin -l h_vmem=2G
Note that the syntax is the same as for requesting resource requirements in job submission script (a resource request starts with the "-l" flag).
Monitoring and managing your jobs
A selection of the most frequently used commands for job monitoring and management:
- qstat: display all (pending, running, ...) jobs of the user (output is empty if user has no jobs in the system).
- qstat -j <jobid>: get a more verbose output, which is particularly useful when analyzing why your job won't run.
- qdel <jobid>: kill job with specified ID (users can, of course, only kill their own jobs).
- qalter: Modify a pending or running job.
- qhost: display state of all hosts.
- qfreenodes: display the number of free nodes and free cores.
Note that there is also a GUI to SGE, invoked by the command qmon
Environment variables
Within a SGE job several environment variables are set by SGE. A short list is given in following table.
Environment variable | Description |
---|---|
JOB_ID | Job ID. |
JOB_NAME | Name of the job (e.g. defined by SGE option -N). |
JOB_SCRIPT | Name of the submitted job. |
PE_HOSTFILE | Hostfile of the parallel environment. |
NHOSTS | Number of used hosts. |
NSLOTS | Number of used slots (cores). |
SGE_O_HOME | Home directory of the submitter. |
SGE_O_HOST | Submitting host. |
SGE_O_LOGNAME | User name of the job submitter. |
SGE_STDERR_PATH | Output file of stderr. |
SGE_STDOUT_PATH | Output file of stdout. |
SGE_TASK_FIRST | First task ID of job array. Only defined in job arrays! |
SGE_TASK_ID | Task ID of job within a job array. Only defined in job arrays! |
SGE_TASK_LAST | Last task ID of job array. Only defined in job arrays! |
SGE_TASK_STEPSIZE | Stepsize within the task ID of job arrays. Only defined in job arrays! |
SGE_O_WORKDIR | Submission directory. |
Documentation
- Grid Engine User Guide
Note that the above on-line documentation refers to a slightly newer version than installed on our HPC systems (6.2u7 vs. 6.2u5). In practice, that should not make much of a difference, though. Unfortunately, all of the original documentation has disappeared from the Web since the acquisition of SUN by Oracle, and it has since become difficult to get useful on-line documentation for "older" versions of SGE. - The following PDFs contain the original documentation of SGE 6.2u5 (converted from the webpages):