Configuration MDCS 2016
Introduction
In general, jobs on the HPC clusters CARL and EDDY have to be submitted via the job scheduler SLURM, which takes care of sharing the available resources (mainly CPU cores and memory). For Matlab jobs, the Matlab Distributed Compute Server (MDCS) provides an easy interface for users running Matlab on their local computer (referred to as the client). Basically, a few Matlab commands on the client generate a set of files, which are transferred to the cluster, where a SLURM job is submitted. Once this job is completed, the results are automatically transferred back to the client.
Obviously, in order for the process to work, some configurations on the client side have to made. These configurations are explained below for Matlab R2016b (but with small adjustments should also work for other versions of Matlab if available on the cluster).
Prerequisites
The following preparations are required before the configuration of the client can be done:
- Some of the steps during the configuration require that you have administration rights on your local machine. Talk to your local system administrator in case you do not have these rights.
- Install a version of Matlab that is supported on the cluster on your local machine (the client). The currently supported version is R2016b. To install Matlab, please refer to instructions on the web page of the IT services (follow the link to the download section - a login is required).
- Identify the matlabrootdir on your machine, on Linux systems this is usually /usr/local/MATLAB/R2016b/ and on Windows systems C:\Program Files\MATLAB\R2016b
- You must be able to connect from your local machine to the login nodes of the cluster via ssh. See Logging in to the system for more information. Windows user can use programs like PuTTY or mobaxterm (recommended) for this purpose.
System-wide Integration
The following steps need to be done once (per version of Matlab) on a local machine running Matlab:
- Download the MDCS SLURM-integration files appropriate for the version of Matlab that you want to use:
- Matlab R2016b (and later): MDCS_SLURM-Integration_R2016b.zip
- Unpack the zip-file, you should find the following files:
- communicatingJobWrapper.sh
- communicatingSubmitFcn.m
- createSubmitScript.m
- deleteJobFcn.m
- extractJobId.m
- getJobStateFcn.m
- getRemoteConnection.m
- getSubmitString.m
- independentJobWrapper.sh
- independentSubmitFcn.m
- Copy all the files from above (the README is not needed) to the directory MATLABROOTDIR/toolbox/local/. You need to have admin rights for this step.
- If the version of Matlab you want to configure is currently running, then exit it (and restart it for the next section).
User-specific configuration
The following steps have to performed by every user once after the system-wide integration was completed (see above). This part of the configuration may also differ somewhat in details for different versions of Matlab. Here are the steps for R2016b.
- (Re)Start Matlab R2016b on your local machine (this is required so that the previous configuration becomes active).
- In HOME tab under Environment click on Parallel --> Manage Cluster Profiles.
- In the new window click on Add --> Custom --> Generic.
- A new Cluster Profile appears is the list with the name GenericProfile. Right-click and select Rename. Enter 'CARL' (without the ') as the new name (if you prefer a different name, e.g. EDDY, then you have to replace CARL with your name in the later examples).
- Right-click on the new profile name and select Edit.
- Now enter the following information (from top to bottom):
- Description: CARL (can be anything really)
- JobStorageLocation: a directory on your local machine where Matlab can store data from submitted jobs. On Linux systems this could be:
- /home/USERNAME/MATLAB/R2016b/JobData
- and on Windows systems:
- C:\Users\USERNAME\Documents\MATLAB\R2016b\JobData
- In case it does not exist the directory has to be created. Note, that it is strongly recommended to use different directories for different versions of Matlab.
- NumWorkers: 36 (this is the maximum number of workers per job)
- ClusterMatlabRoot: where Matlab is installed on the cluster, enter the following (for R2016b)
- Skip to Submit Functions and enter for IndependentSubmitFcn:
- {@independentSubmitFcn, 'carl.hpc.uni-oldenburg.de', '/user/abcd1234/MATLAB/R2016b/JobData'}
- and for the CommunicatingSubmitFcn:
- {@communicatingSubmitFcn, 'carl.hpc.uni-oldenburg.de', '/user/abcd1234/MATLAB/R2016b/JobData'}
- EDDY users may replace carl with eddy, of course. The last entry is a directory where Matlab stores Job data during run time. You can use /user, /data, or /work for this purpose (one of the latter two is recommended). "abcd1234" has to be replaced with your own user name. Create a new directory if it does not exist, and use a different directory for every version of Matlab.
- OperatingSystem: set to 'unix'
- HasSharedFilesystem: set to 'false'
- Skip ahead to Jobs and Task Functions and enter for GetJobStateFcn:
- @getJobStateFcn
- and for DeleteJobFcn:
- @deleteJobFcn
- Click on Done.
Validation
After completing the previous step, you can validate your configuration.
- Click on the Validation tab. You will see the five validations steps (of which up to four can be unselected if desired).
- Enter '4' in the field 'Number of workers to use' and start the validation (you can start the validation with a larger number of workers but that will take much longer).
- Click on the check mark to validate your created profile.
- Enter your username:
- Enter your password:
- After a few minutes the first four tests should have passed successfully, the last one may fail but that can be ignored (and skipped). If one of the other tests fails, click on 'Show Details' to get more information. If you need help, copy the Validation Results and send an e-mail to Scientific Computing.
After the validation was successful you are ready to use Matlab on the cluster for your own applications. See the Matlab examples in the Wiki.