Configuration MDCS

From HPC users
Jump to navigationJump to search

Introduction

In general, jobs on the HPC clusters FLOW and HERO have to be submitted via the Sun Grid Engine (SGE), which takes care of sharing the available resources (mainly CPU cores and memory). For Matlab jobs, the Matlab Distributed Compute Server (MDCS) provides an easier interface for users running Matlab on their local computer (refered to as the client). Basically, a few Matlab commands on the client generate a set of files, which are transfered to the cluster, where an SGE job is submitted. Once this job is completed, the results are automatically transfered back to the client.

Obviously, in order for the process to work, some configurations on the client side have to made. The configurations are explained below.

Currently, the details of the configuration depend on the version of Matlab used. For version R2011b and earlier, please refer to this guide. For R2014b and later, read on below.

Prerequisites

The following preparations are required before the configuration of the client can be done:

  1. Some of the steps during the configuration require that you have adminstration rights on your local machine. Talk to your local system adminstrator in case you do not have these rights.
  2. Install a version of Matlab that is supported on the cluster on your local machine (the client). The currently supported versions are R2011b, R2014b, and R2015b. To install Matlab, please refer to instructions on the web page of the IT services (under Academic License follow the link to the download section - a login is required).
  3. Identify the matlabrootdir on your machine, on Linux systems this is usually /usr/local/MATLAB/R2014b/ and on Windows systems C:\Program Files\MATLAB\R2014b
  4. You must be able to connect from your local machine to the login nodes of the cluster via ssh. See Logging in to the system for more information. Windows user can use programs like PuTTY or mobaxterm for this purpose.
  5. You will need to know the fully qualified domain name or the IP address of your local machine, for example celeborn.fk5.uni-oldenburg.de or 134.106.219.162. To find this out, simply login on the cluster and type the command who, then find your user name (abcd1234) in the list that appears. In that line the last entry in () is your hostname.

System-wide Integration

The following steps need to be done once (per version of Matlab) on a local machine running Matlab:

  1. Download the MDCS SGE-integration files appropriate for the version of Matlab that you want to use:
  2. Unpack the zip-file, you should find the following files:
    1. communicatingJobWrapper.sh
    2. communicatingSubmitFcn.m
    3. createSubmitScript.m
    4. deleteJobFcn.m
    5. extractJobId.m
    6. getJobStateFcn.m
    7. getRemoteConnection.m
    8. getSubmitString.m
    9. independentJobWrapper.sh
    10. independentSubmitFcn.m
    11. README
    12. startup.m
  3. In the file startup.m replace in the the line pctconfig('hostname','myhost.uni-oldenburg.de'); myhost.uni-oldenburg.de with the hostname of your local machine, e.g. pctconfig('hostname','134.106.219.162');
    Alternatively, you can add the line above to an already existing startup.m to prevent overwriting your Matlab startup settings (in that case omit copying startup.m in the next step).
  4. Copy all the files from above (the README is not needed) to the directory matlabroot/toolbox/local/. You need to have admin rights for this step.
  5. If the version of Matlab you want to configure is currently running, then exit it (and restart it for the next section).

User-specific configuration

The following steps have to performed by every user once after the system-wide integration was completed (see above). This part of the configuration is also different for different versions of Matlab. Here are the steps for R2014b and later.

  1. (Re)Start Matlab R2014b (or later) on your local machine (this is required so that the previous configuration becomes active).
  2. In HOME tab under Environment click on Parallel-->Manage Cluster Profiles.
  3. In the new window click on Add-->Custom-->Generic. Confirm with OK.
  4. A new Cluster Profile appears is the list with the name GenericProfile. Right-click and select Rename. Enter 'HERO' (without the ') as the new name (if you prefer a different name, e.g. FLOW, then you have to replace HERO with your name in the later examples).
  5. Right-click on the new profile name and select Edit.
  6. Now enter the following information (from top to bottom):
    • Description: HERO (can be anything really)
    • JobStorageLocation: a directory on your local machine where Matlab can store data from submitted jobs. On Linux systems this could be
/home/harfst/MATLAB/R2014b/JobData

and on Windows systems

C:\Users\Stefan Harfst\Documents\MATLAB\R2014b\JobData

In case it does not exist the directory has to be created. Note, that it is recommended to use different directories for different versions of Matlab.

    • NumWorkers: 36 (this is the maximum number of workers per job, for the validation reduce this number to 4)
    • ClusterMatlabRoot: where Matlab is installed on the cluster, enter the following (for R2014b)
/cm/shared/uniol/apps/matlab/r2014b
    • Skip to Submit Functions and enter for IndependentSubmitFcn
{@independentSubmitFcn, 'hero.hpc.uni-oldenburg.de', '/data/work/hrz/abcd1234/MATLAB/R2014b/JobData'}

and for the CommunicatingSubmitFcn

{@communicatingSubmitFcn, 'hero.hpc.uni-oldenburg.de', '/data/work/hrz/abcd1234/MATLAB/R2014b/JobData'}

Flow users should replace hero with flow, of course. The last entry is a directory where Matlab stores Job data during run time. You can use /user or /data/work for this purpose. hrz/abcd1234 has to be replaced by the location of your home or work directory. Create a new directory if it does not exist, and use a different directory for every version of Matlab.

    • OperatingSystem: set to 'unix'
    • HasSharedFilesystem: set to 'false'
    • Skip ahead to Jobs and Task Functions and enter for GetJobStateFcn
@getJobStateFcn

and for DeleteJobFcn

@deleteJobFcn
    • Click on Done.

Validation

After completing the previous step, you can validate your configuration.

  1. Right click on the profile name again and select edit. Change the maximum number of workers from 36 to 4 and confirm the change by clicking Done.
  2. Click on the Validation tab and start the validation.
  3. Enter your user name and password when asked for (click 'No' for creditials file).
  4. After a few minutes the first four tests should have passed successfully, the last one will fail but that can be ignored. If one of the other tests fails, click on 'Show Details' to get more information. If you need help, copy the Validation Results and send an e-mail to Stefan Harfst.
  5. After the successful validation do not forget to set the maximum number of workers back to 36.

After the validation was successful you are ready to use Matlab on the cluster for your own applications. See the Matlab examples in the Wiki.