Difference between revisions of "Basic Examples MDCS 2016"

From HPC users
Jump to navigationJump to search
 
(One intermediate revision by the same user not shown)
Line 101: Line 101:
|-
|-
| taskspernode || 0 || control the distribution of tasks on nodes, default 0 means no specific distribution, can be useful to make sure your job is distributed in a specific way
| taskspernode || 0 || control the distribution of tasks on nodes, default 0 means no specific distribution, can be useful to make sure your job is distributed in a specific way
|-
| mailuser || not set || specify an e-mail address to let Slurm send out notifications about the job status
|-
| mailtype || ALL || control the reasons for Slurm to send an e-mail to the specified mailuser (no e-mails are sent if no mailuser is specified)
|-
| ngpus || 0 || request to allocate one or more GPUs per node requested (experimental), make sure to also select a partition with GPU-nodes
|}
|}


These correspond, in SLURM language, to the <tt>--partition</tt>, <tt>--time</tt>, <tt>--mem-per-cpu</tt>, <tt>--gres:tmpdir</tt> and <tt>--tasks-per-node</tt> parameters of the sbatch command, respectively. The <tt>runtime</tt> must be specified in the format <tt>''hh:mm:ss''</tt> (hours, minutes, seconds), and must not be longer than 21 days. The memory type parameters as a positive (integer) number followed by <tt>K</tt> (for ''Kilobyte''), <tt>M</tt> (for ''Megabyte''), or <tt>G</tt> (for ''Gigabyte''). If you want to change any of these resources from its defaults, you have to add optional arguments to one of the functions
These correspond, in SLURM language, to the <tt>--partition</tt>, <tt>--time</tt>, <tt>--mem-per-cpu</tt>, <tt>--gres=tmpdir:</tt>, <tt>--tasks-per-node</tt>, <tt>--mail-user</tt>, <tt>--mail-type</tt>, and , <tt>--gres=gpu:</tt> parameters of the sbatch command, respectively. The <tt>runtime</tt> must be specified in the format <tt>''hh:mm:ss''</tt> (hours, minutes, seconds), and must not be longer than 21 days. The memory type parameters as a positive (integer) number followed by <tt>K</tt> (for ''Kilobyte''), <tt>M</tt> (for ''Megabyte''), or <tt>G</tt> (for ''Gigabyte''). If you want to change any of these resources from its defaults, you have to add optional arguments to the cluster profile <tt>sched</tt> (or, for older Matlab versiosn, to one of the functions <tt>independentSubmitFcn</tt> or <tt>communicatingSubmitFcn</tt>, depending on the type of your job (e.g., if you request a 'Pool', it is by definition always a "communicating" job)) as shown in the examples below.
<tt>independentSubmitFcn</tt> or <tt>communicatingSubmitFcn</tt>, depending on the type of your job (e.g., if you request a 'Pool', it is by definition always a "communicating" job).


==== Example (modifying runtime and memory) ====
==== Example (modifying runtime and memory) ====
Line 142: Line 147:
! Resource Specification !! Value !! Effect
! Resource Specification !! Value !! Effect
|-
|-
| nodes || ''not set'' (default) || limits the job to<br> a) 1 node if less than half of the cores of a single node are used, <br> b) 1-2 nodes if up to all cores of a single node are used, and <br> c) <tt>n-2*n</tt> nodes if the job does not fit on a single node (with <tt>n</tt> being the minimal number of nodes
| nodes || ''not set'' (default) || limits the job to<br> a) 1 node if less than half of the cores of a single node are used, <br> b) <tt>n-2*n</tt> nodes if the job does not fit on a single node (with <tt>n</tt> being the minimal number of nodes needed)
|-
|-
| nodes || m || limits the job to exactly <tt>m</tt> nodes (workers are distributed freely between the nodes)
| nodes || m || limits the job to exactly <tt>m</tt> nodes (workers are distributed freely between the nodes)

Latest revision as of 08:06, 13 June 2022

Basic MDCS usage: Example of a task-parallel job

The following example illustrates how to submit a job to the HPC cluster (or CARL and EDDY). After you have entered the batch command and the command returned with no error, you could, e.g., do some other MATLAB computations, close your MATLAB session, or even power down your machine. In the meantime, your MDCS job gets executed "in the background". After it has finished, you can analyse the results, visualize them, etc.

As an elementary example of an "embarrassingly parallel" (or "task-parallel" in MATLAB terminology) problem we use a parameter sweep of a 2nd order ODE (the damped Harmonic Oscillator). The parameters varied are the spring constant (or equivalently, the eigenfrequency) and the damping. For each pair of parameter values and fixed initial condition, the solution of the ODE is calculated (up to a certain maximum time) and the peak value of the amplitude is calculated and stored in an array.

The parameter sweep is achieved in a for loop. This can easily be parallelized using parfor instead. The total execution time of the parfor loop is measured with the tic and toc commands.

Please download the following zip-file containing the definition of the ODE system in odesystem.m and the MATLAB script for the parameter sweep in paramSweep_batch.m: Download

One way to run this code in parallel is to open a Matlabpool of the cluster and then executing the script that performs the parameter sweep. MATLAB then automatically distributes the independent loop iterations across the available workers (or "labs" in this case). Define a "scheduler object" sched which describes your cluster configuration. In the example setup below, the configuration is named CARL) and the job will be submitted to the cluster by using tthe batch command:

sched = parcluster('CARL');
job = batch(sched, 'paramSweep_batch', 'Pool', 7, 'AttachedFiles', {'odesystem.m'});

The first time you submit a job to the cluster in a MATLAB session, you will be prompted for your credentials (username, password). Please enter your usual cluster account data. The number of "labs" is 1 + the number specified after the 'Pool' keyword. Thus in the above example, the job would run on eight workers.

The specification of the file dependencies by the 'AttachedFiles' keyword is indicating that the script paramSweep_batch.m depends on odesystem.m and thus the latter must be copied to the cluster such that the script can run there (that is one of the purposes of the local and remote "job data" directories that must be specified in the configuration of the scheduler). However, Matlab will detect many dependencies automatically and will attach them automatically. To see which files where attached use the command

listAutoAttachedFiles(job)

after job submission. This feature is controlled by the parameter AutoAttachFiles of the batch command and is set to true per default.

Checking the Status of a Submitted Job

Check the state of the job with the Job Monitor (from the Main menue: Parallel -> Job Monitor), or in the command window:

job.State

By typing

job

you get additional useful information like, e.g., the start time of the job (if it has already started running), the current runtime, etc. Another useful command is

sched.getJobClusterData(job)

which will return, among other information, the job-ID from the cluster manager which can be used in turn to get more information later on (with the sacct command).

Retrieving Job Data

In order to analyze results after the job has finished, load the job data into the workspace:

jobData=load(job);

Check the runtime of the job:

jobData.t1

You can visualize the results by:

figure;
f=surf(jobData.bVals, jobData.kVals, jobData.peakVals);
set(f,'LineStyle','none');
set(f,'FaceAlpha',0.5);
xlabel('Damping'); ylabel('Stiffness'); zlabel('Peak Displacement');
view(50, 30);

If you no longer need the job, clean up (includes deletion of all files related to the job in the local job data directory):

delete(job);

The following table shows the runtime measured on the CARL cluster as a function of the number of workers:

Number of tasks (SLURM) Number of worker (Pool option) Runtime (in sec.)
1 (serial job) 1 62
2 1 59
4 3 22
8 7 11
16 15 7
32 31 4

Obviously, the simple parallelization strategy using the parfor loop leads to a significant speed-up in this case.

Advanced usage: Specifying resources

Any job running on the cluster must specify certain resources (e.g., runtime, memory, disk space) which are passed to SLURM as options to the sbatch command. If you submit jobs to the cluster via MDCS, you do not directly access SLURM and, usually, do not have to care at all about these resource requests. MATLAB has been configured to choose reasonable default values (e.g., partition carl.p and eddy.p, a runtime of 24 hours, 2 GB memory per worker, 50 MB disk space) and to correctly pass them to SLURM. The default values for runtime, memory, and disk space are printed when you submit a MATLAB job via MDCS.

However, it is possible to modify these resource requirements for an MDCS jobs from within the MATLAB session if necessary (e.g., if your job runs longer than 24 hours, needs more memory, etc.). That also makes sense if the requirements are significantly lower than the default values (e.g., the runtime is only one hour, the memory requirements much lower, etc.), since that would avoid unnecessary blocking of resources and also increase the chance that the job starts earlier if the cluster is under heavy load. In principle, everything that one would usually write into an SLURM job submission script could also be specified from within a MATLAB session when one submits an MDCS job.

The following table shows the resource specifications that have been implemented and their defaults:

Resource Specification Default Value Comment
partition carl.p,eddy.p change to e.g. mpcb.p to request a specific type of node, available Partitions
runtime 24:0:0 (24h) shorter jobs will have shorter wait times due to backfilling of SLURM
memory 2G memory is per worker, see hardware overview for details on memory per node
diskspace 50M for temporary storage on local disks, used for storing attached files (so typically not very large files)
taskspernode 0 control the distribution of tasks on nodes, default 0 means no specific distribution, can be useful to make sure your job is distributed in a specific way
mailuser not set specify an e-mail address to let Slurm send out notifications about the job status
mailtype ALL control the reasons for Slurm to send an e-mail to the specified mailuser (no e-mails are sent if no mailuser is specified)
ngpus 0 request to allocate one or more GPUs per node requested (experimental), make sure to also select a partition with GPU-nodes

These correspond, in SLURM language, to the --partition, --time, --mem-per-cpu, --gres=tmpdir:, --tasks-per-node, --mail-user, --mail-type, and , --gres=gpu: parameters of the sbatch command, respectively. The runtime must be specified in the format hh:mm:ss (hours, minutes, seconds), and must not be longer than 21 days. The memory type parameters as a positive (integer) number followed by K (for Kilobyte), M (for Megabyte), or G (for Gigabyte). If you want to change any of these resources from its defaults, you have to add optional arguments to the cluster profile sched (or, for older Matlab versiosn, to one of the functions independentSubmitFcn or communicatingSubmitFcn, depending on the type of your job (e.g., if you request a 'Pool', it is by definition always a "communicating" job)) as shown in the examples below.

Example (modifying runtime and memory)

Suppose your parallel (Pool) job has a runtime of at most 3 days (with a suitable safety margin) and needs 4 GB of memory per worker. If you are a CARL user, you would then first define a "scheduler" object with

sched = parcluster('CARL');

which will have the default values set for e.g. runtime and memory. The defaults can be changed, the required commands depend on the version of Matlab.

Before Matlab R2017b, you need to modify the CommunicatingSubmitFcn (supply additional parameters) of that object:

set(sched, 'CommunicatingSubmitFcn', cat(2, sched.CommunicatingSubmitFcn, {'runtime','72:0:0','memory','4G'}));

Other parameters can be added from the list above in a similar way. It is also possible to modify existing values or simply adding the same parameter again (the last addition should be used).

Since Matlab R2017b changing the default values becomes a little easier. In our example the commands

sched.AdditionalProperties.memory='4G' 
sched.AdditionalProperties.runtime='72:00:00' 

will add the neccessary changes to the scheduler object. Other parameters from the list above can be treated in the same way.

The job can be submitted, e.g., via the batch command, but you also have to specify the scheduler object as first argument since otherwise the default configuration would be chosen:

job = batch(sched, 'paramSweep_batch', 'Pool', 7, 'AttachedFiles', {'odesystem.m'});

You will see the modified values of the resources from the messages printed when you submit the job.

If you need to remove a value from the sched.AdditionalProperties to get back to the default you can use e.g. the command

remove(sched.AdditionalProperties, 'memory')

Influencing the Node Distribution of Workers

Since Matlab 2017b the Slurm integration files include the option to set a value for taskspernode which adds the corresponding option to the sbatch-command executed in the background of MDCS. This allows you to influence how the pool of worker processes is distributed among different nodes and it should be used to constrain your Matlab jobs to fewer nodes for efficiency.

With Matlab 2019b the Slurm integration files per default use fewer nodes and the new parameter nodes can be used to modify the default behavior (in addition to taskspernode. The following table shows how you can achieve the desired distribution:

Resource Specification Value Effect
nodes not set (default) limits the job to
a) 1 node if less than half of the cores of a single node are used,
b) n-2*n nodes if the job does not fit on a single node (with n being the minimal number of nodes needed)
nodes m limits the job to exactly m nodes (workers are distributed freely between the nodes)
nodes -m limits the job to up to m nodes with a minumum number determined automatically
nodes 0 disables the setting of --nodes so that the workers are distributed freely on any number of nodes (this is the old default behavior which may led to a faster scheduling of your job but with longer run time)
taskpernode t distributes t workers per nodes (pool size plus one must be a multiple of t, this setting is exclusive with setting nodes

The aim of this new setting is to make Matlab job run more efficiently on fewer nodes. You can change the default behavior if needed and some sanity checks are in place to make sure the setup is valid.

Common Errors and Warnings

  • Warning: Path not found or nonexistent directory
    You will probably notice that your Matlab jobs finish with warnings (you will see an exclamation mark in a yellow triangle in the job monitor or a warning is printed when you load the job data). If the warning is about a "Path not found" or a "nonexistent directory" you can ignore these warnings. Matlab remembers the directory from where you started your job and the worker running on the cluster will try to change to that same directory.
    If you want to suppress warnings about a wrong or missing directory, you can add either (R2016b and before) 'CurrentFolder', '.' or (R2017b and later) 'AutoAddClientPath', false to the batch-commmand. E.g.:
job = batch(sched, 'paramSweep_batch', 'Pool', 7, 'AutoAddClientPath', false);