Difference between revisions of "Advanced Examples MDCS 2016"
(Created page with "You will find a few examples for Matlab applications using MDCS on this page. Every example illustrated below was succesfully tested on CARL and EDDY. == Example application:...") |
|||
Line 119: | Line 119: | ||
For test purposes one might execute the myExample_2DRandWalk.m | For test purposes one might execute the myExample_2DRandWalk.m | ||
directly from within a Matlab session on a local Desktop PC. | directly from within a Matlab session on a local Desktop PC. | ||
== Specifying file dependencies == | |||
So as to sumbit the respective job to the local HPC system one | |||
might assemble the following job submission script, called <tt>mySubmitScript_v1.m</tt>: | |||
<nowiki> | |||
sched = parcluster('CARL'); | |||
jobRW =... | |||
batch(... | |||
sched,... | |||
'myExample_2DRandWalk',... | |||
'pool',2,... | |||
'FileDependencies',{... | |||
'singleRandWalk.m',... | |||
'averageRgyr.m'... | |||
}... | |||
); | |||
</nowiki> | |||
In the above job submission script, all dependent files are listed as <tt>FileDependencies</tt>. I.e., | |||
the .m-files specified therein are copied from your local desktop PC to the HPC system '''at run time'''. | |||
Now, from within a Matlab session I navigate to the Folder where the above .m-files | |||
are located in and call the job submission script, i.e.: | |||
<nowiki> | |||
>> cd MATLAB/R2011b/example/myExamples_matlab/RandWalk/ | |||
>> mySubmitScript_v1 | |||
runtime = 24:0:0 (default) | |||
memory = 1500M (default) | |||
diskspace = 50G (default) | |||
</nowiki> | |||
before the job is actually submitted, I need to specify my user ID and password, | |||
of course. Once the job is successfully submitted, I can check the state of the | |||
job via typing <tt>jobRW.state</tt>. However, if you want to get some more information | |||
on the status of your job, you might want to try to log-in on the HPC system and to | |||
simply type the command <tt>qstat</tt> on the commandline. This will yield several | |||
details related to your job which you might process further to see on which execution nodes your | |||
job runs, why it won't start directly etc. Note that MATLAB provides only a wrapper for | |||
the <tt>qstat</tt> command which in some cases result in a misleading output. E.g., | |||
if, for some reason, your job changes to the ''error''-state it might be that MATLAB erroneously | |||
reports it to be in the ''finished''-state. | |||
Once the job (really) has finished, i.e., | |||
<nowiki> | |||
>> jobRW.state | |||
ans = | |||
finished | |||
</nowiki> | |||
I might go on and load the results to my desktop computer, giving | |||
<nowiki> | |||
>> res=load(jobRW); | |||
>> res | |||
res = | |||
N: 10000 | |||
Rgyr_av: [100x1 double] | |||
Rgyr_sErr: [100x1 double] | |||
Rgyr_t: [10000x100 double] | |||
ans: 'finished' | |||
res: [1x1 struct] | |||
tMax: 100 | |||
</nowiki> | |||
However, note that there are several drawbacks related to the usage of <tt>FileDependencies</tt>: E.g., | |||
* each worker gets an own copy of the respective .m-files when the job starts (in particular, workers that participate in the computing process do not share a set of .m-files in a common location), | |||
* the respective .m-files are not available on the HPC system once the job has finished, | |||
* comparatively large input files need to be copied to the HPC system over and over again, if several computations on the same set of input data are performed. | |||
In many cases a different procedure, based on specifying <tt>PathDependencies</tt>, outlined below in detail, might be recommendend. |
Revision as of 14:23, 7 March 2017
You will find a few examples for Matlab applications using MDCS on this page. Every example illustrated below was succesfully tested on CARL and EDDY.
Example application: 2D random walk
Consider the Matlab .m-file myExample_2DRandWalk.m (listed below), which among other things illustrates the use of sliced variables and independent stremas of random numbers for use with parfor-loops.
This example program generates a number of N
independent
2D random walks (a single step has steplength 1 and a
random direction). Each random walk performs tMax
steps.
At each step t
, the radius of gyration (Rgyr
) of walk i
is stored in the array Rgyr_t
in the entry Rgyr_t(i,t)
.
While the whole data is availabe for further postprocessing,
only the average radius of gyration Rgyr_av
and the respective
standard error Rgyr_sErr
for the time steps 1...tMax
are
computed immediately (below it will also be shown how to store the data in an output file on HERO
for further postprocessing).
%% FILE: myExample_2DRandWalk.m % BRIEF: illustrate sliced variables and independent streams % of random numbers for use with parfor-loops % % DEPENDENCIES: % singleRandWalk.m - implements single random walk % averageRgyr.m - computes average radius of gyration % for time steps 1...tMax % % AUTHOR: Oliver Melchert % DATE: 2013-06-05 % N = 10000; % number of independent walks tMax = 100; % number of steps in individual walk Rgyr_t = zeros(N,tMax); % matrix to hold results: row=radius % of gyration as fct of time; % col=independent random walk instances parfor n=1:N % create random number stream seeded by the % current value of n; you can obtain a list % of all possible random number streams by % typing RandStream.list in the command window myStream = RandStream('mt19937ar','Seed',n); % obtain radius of gyration as fct of time for % different independent random walks (indepence % of RWs is ensured by connsidering different % random number streams for each RW instance) Rgyr_t(n,:) = singleRandWalk(myStream,tMax); end % compute average Rgyr and its standard error for all steps [Rgyr_av,Rgyr_sErr] = averageRgyr(Rgyr_t);
As liste above, the .m-file depends on the following files:
- singleRandWalk.m, implementing a single random walk, reading:
function [Rgyr_t]=singleRandWalk(randStream,tMax) % Usage: [Rgyr_t]=singleRandWalk(randStream,tMax) % Input: % randStream - random number stream % tMax - number of steps in random walk % Output: % Rgyr_r - array holding the radius of gyration % for all considered time steps x=0.;y=0.; % initial walker position Rgyr_t = zeros(tMax,1); for t = 1:tMax % implement random step phi=2.*pi*rand(randStream); x = x+cos(phi); y = y+sin(phi); % record radius of gyration for current time Rgyr_t(t)=sqrt(x*x+y*y); end end
- averageRgyr.m, which computes the average radius of gyration of the random walks for time steps
1...tMax
, reading:
function [avList,stdErrList]=averageRgyr(rawDat) % Usage: [av]=averageRgyr(rawDat) % Input: % rawData - array of size [N,tMax] where N is the % number of independent random walks and % tMax is the number of steps taken by an % individual walk % Returns: % av - aveage radius of gyration for the steps [Lx,Ly]=size(rawDat); avList = zeros(Ly,1); stdErrList = zeros(Ly,1); for i = 1:Ly [av,var,stdErr] = basicStats(rawDat(:,i)); avList(i) = av; stdErrList(i) = stdErr; end end function [av,var,stdErr]=basicStats(x) % usage: [av,var,stdErr]=basicStats(x) % Input: % x - list of numbers % Returns: % av - average % var - variance % stdErr - standard error av=sum(x)/length(x); var=sum((x-av).^2)/(length(x)-1); stdErr=sqrt(var/length(x)); end
For test purposes one might execute the myExample_2DRandWalk.m directly from within a Matlab session on a local Desktop PC.
Specifying file dependencies
So as to sumbit the respective job to the local HPC system one might assemble the following job submission script, called mySubmitScript_v1.m:
sched = parcluster('CARL'); jobRW =... batch(... sched,... 'myExample_2DRandWalk',... 'pool',2,... 'FileDependencies',{... 'singleRandWalk.m',... 'averageRgyr.m'... }... );
In the above job submission script, all dependent files are listed as FileDependencies. I.e., the .m-files specified therein are copied from your local desktop PC to the HPC system at run time.
Now, from within a Matlab session I navigate to the Folder where the above .m-files are located in and call the job submission script, i.e.:
>> cd MATLAB/R2011b/example/myExamples_matlab/RandWalk/ >> mySubmitScript_v1 runtime = 24:0:0 (default) memory = 1500M (default) diskspace = 50G (default)
before the job is actually submitted, I need to specify my user ID and password, of course. Once the job is successfully submitted, I can check the state of the job via typing jobRW.state. However, if you want to get some more information on the status of your job, you might want to try to log-in on the HPC system and to simply type the command qstat on the commandline. This will yield several details related to your job which you might process further to see on which execution nodes your job runs, why it won't start directly etc. Note that MATLAB provides only a wrapper for the qstat command which in some cases result in a misleading output. E.g., if, for some reason, your job changes to the error-state it might be that MATLAB erroneously reports it to be in the finished-state.
Once the job (really) has finished, i.e.,
>> jobRW.state ans = finished
I might go on and load the results to my desktop computer, giving
>> res=load(jobRW); >> res res = N: 10000 Rgyr_av: [100x1 double] Rgyr_sErr: [100x1 double] Rgyr_t: [10000x100 double] ans: 'finished' res: [1x1 struct] tMax: 100
However, note that there are several drawbacks related to the usage of FileDependencies: E.g.,
- each worker gets an own copy of the respective .m-files when the job starts (in particular, workers that participate in the computing process do not share a set of .m-files in a common location),
- the respective .m-files are not available on the HPC system once the job has finished,
- comparatively large input files need to be copied to the HPC system over and over again, if several computations on the same set of input data are performed.
In many cases a different procedure, based on specifying PathDependencies, outlined below in detail, might be recommendend.