Difference between revisions of "PALM"

From HPC users
Jump to navigationJump to search
 
(42 intermediate revisions by 3 users not shown)
Line 1: Line 1:
The software PALM is a Large-eddy simulation (LES) model for atmospheric and oceanic flow developed at the [http://www.muk.uni-hannover.de Instiute of Meteorology and Climatology] of the Leibniz Universität Hannover.
The software PALM is a large-eddy simulation (LES) model for atmospheric and oceanic flows developed at the [http://www.muk.uni-hannover.de Institute of Meteorology and Climatology] of the Leibniz Universität Hannover.


== Installation ==
== Installation ==


Please download and follow the follwing pdf-document for detailled instructions for the installation of PALM:
Please follow the detailed instructions given in the following pdf-document:


* [[Media:PALM_installation_on_FLOW_090113.pdf|Installation of PALM on FLOW]]
* [[Media: PALM_installation_on_FLOW_2015_01_30_bwi.pdf|Installation of PALM on FLOW]]


== SGE script ==
== SGE scripts ==


A sample SGE script for submitting PALM jobs can be found here:
'''With recent PALM versions''' (revision 1100 or newer) PALM jobs are submitted from the local computer. SGE scripts will be generated automatically, so '''you don't need to create an SGE script by yourself'''.
<br />
<br />
'''If you use an older PALM version than revision 1100 (which is not recommended!)''', a sample SGE script for submitting PALM jobs can be found here:
* [[Palm.sge|palm.sge]]
Please copy the sample script to your working directory (as palm.sge or <different-name>.sge). For carrying out the test run (to verify the installation), the script does not need to be modified. Please see the [[Media:PALM installation on FLOW 090113.pdf|old installation guide]] for instructions on how to modify the script for different runs.


* [[palm.sge]]
== Submitting PALM jobs ==


Please copy the sample script to your working directory (as ''palm.sge'' or ''<different-name>.sge''). For carrying out the test run (to verify the installation), the script does not need to be modified. Please see the installation documentation for instructions on how to modify the script for different runs.
PALM jobs are submitted from your local computer with the script mrun. A typical mrun call looks like this:
  mrun -z -d <job name> -h lcflow -K parallel -c ".mrun.config.forwind_flow" -X <number of slots> -t <CPU time in s> -r "d3# <output file list>"
<output file list> can be one or several of the following strings (separated by blanks): "3d#" (3d data), "<tt>xy#</tt>", "<tt>xz#</tt>", "<tt>yz#</tt>" (cross sections), "<tt>ma#</tt>" (masked data), "<tt>pr#</tt>" (profiles), "<tt>ts#</tt>" (time series), "<tt>sp#</tt>" (spectra). If you want to restart jobs or use turbulent inflow, the output of binary data for restarts can be switched on by simply adding "<tt>restart</tt>" to the output file list. For a restart run, all "<tt>#</tt>" have to be replaced by "<tt>f</tt>". A run with turbulent inflow (which uses data of a precursor run for initialization) requires an "<tt>rec</tt>".
'''Example:''' The mrun call for a run with turbulent inflow and desired output of 3d data, profiles and time series as well as binary data for possible restarts would look like this:
  mrun -z -d example2 -h lcflow -K parallel -c ".mrun.config.forwind_flow" -X 144 -t 86400 -r "d3# rec 3d# pr# ts# restart"
In this case, the job "''example2''" will run on 144 slots (= 12 cores) for 24 hours.
<br />
In the submission process you will be asked to specify the memory demand in MBytes. If the memory demand is higher than 1850 MB, the job can only run on the high memory nodes. For most PALM jobs, 1850 MB should be sufficient.
 
== Runtime estimation ==
 
The runtime of PALM (which is needed for the SGE script and for mrun) can be estimated by
 
<center><math>
T_{CPU}[\mbox{s}] = c_{PALM,FLOW}\frac{N_{Iterations}\cdot N_{Points}}{N_{CPU}}
</math></center>
 
where the constant <math>c_{PALM,FLOW}</math> is approximately
 
<center><math>
c_{PALM,FLOW} \approx 8\cdot 10^{-6}\mbox{s}
</math></center>
 
This value is a first guess from a sample of simulation data. However, this number might have to be corrected in the future. It depends on additional parameters as amount of output data and complexity of user-defined code.
 
The number of points is defined by the product of the grid points in ''x''-, ''y''- and ''z''-direction
 
<center><math>
N_{Points} = N_x \cdot N_y \cdot N_z
</math></center>
 
The number of iterations can be calculated by 
 
<center><math>
N_{Iterations} = \frac{T_{total}}{\Delta t}
</math></center>
 
with the physical simulation time <math>T_{total}</math> and the timestep size <math>\Delta t</math>. The timestep size <math>\Delta t</math> can (in most cases) be estimated by the Courant-Friedrichs-Levy like criteria
 
<center><math>
\Delta t = \frac{\max\left(\Delta x\right)}{2\bar u_{max}} =\frac{\max\left(\frac{L_x}{N_x},\frac{L_y}{N_y}, \frac{L_z}{N_z}\right)}{2\bar u_{max}}
</math></center>
 
where ''L''  and ''N'' are the length of the simulated domain and resolution in ''x''-, ''y''- and ''z''-direction, respectively. The velocity <math>\bar u_{max}</math> is the maximal windspeed of the simulation.
 
'''Note:''' In the time estimation the scaling is assumed to be linear which is not true for large number of used CPU cores and small resolutions (<math>O(10^5)</math> points/core). In this case the constant could be larger.


== Known issues ==
== Known issues ==
* With the Intel Compiler 12.0.0 the compiler flag ''-no-prec-div'' and ''-np-prec-sqrt'' can lead to different results for same runs. Please don't uses these flags. Note that the flags will automatically set when using the compiler option ''-fast''. In this case you should set ''-prec-div'' and ''-prec-sqrt''.
* When you've made changes in the ''.mrun_config'' don't forget to run ''mbuild'' once again after adjusting the scripts by
<pre>
mbuild -u -h lcflow
mbuild -h lcflow
</pre>
:Before doing that, you should delete the folder <tt>MAKE_DEPOSITORY</tt> on the target system (e.g. FLOW) and the *.x and *.o files in the folders <tt>trunk/SCRIPTS</tt> and <tt>trunk/UTIL</tt> on both FLOW and your local computer.
 
* With the Intel Compiler 12.0.0 the compiler flag ''-no-prec-div'' and ''-np-prec-sqrt'' can lead to different results for same runs. Please don't use these flags. Note that the flags will automatically be set when using the compiler option ''-fast''. In this case you should set ''-prec-div'' and ''-prec-sqrt''.
 
* When submitting PALM jobs from your local computer, job-protocols are sometimes not transferred back to the local host via scp. In this case, they remain in the job_queue-folder on FLOW.
 
== Debugging of PALM ==
Sometimes it is necessary to debug the code, especially when using an own ''user code''. Here are some hints to debug PALM when running parallel


<ol>
  <li> The simplest way is to add print statements in the user code, at least in the beginning and at the end of each procedure. However, this method is in many cases not very useful.
  </li>
  <li> Usage debug symbols within the executable. This is necessary for most of the debuggers. For this you have to add the compiler options ''-g'' and ''-traceback'' to the definition ''%fopts'' and    ''%lopts'' in the ''.mrun_config'' file. Maybe you have to reduce the optionization level (compiler options ''-O3'', ''-Ofast'', ''-align all'', ''-ftz'', ''-fno-alias'', ''-no-scalar-rep'', ''-no-prec-sqrt'', ''-ip'', ''-ipo'') to ''-O2'' to get the right output in the debugger. Don't forget to build the code again (see [[PALM#Known_issues | Known issues]]).
  </li>
  <li> To enable additional checks (e.g. array bounds) during the runtime please add the compiler option ''-check'' to the definition ''%fopts'' and ''%lopts'' in the ''.mrun_config'' file. Note that the code will run slower. This option is only useful for debugging but not for normal runs. And don't forget to build the code again (see [[PALM#Known_issues | Known issues]]).
  </li>
  <li> Usage of the debug tool ''valgrind''. This module enables different checks of the code (see [[valgrind]]), especially the check of invalid memory usage. To use this tool please do following steps:
  <ol style="list-style-type:lower-roman">
    <li> add ''valgrind'' to the definition ''%modules'' in the ''.mrun_config'' file.</li>
    <li> add compiler option ''-g'' (see above)</li>
    <li> modify the script ''mrun''<br /><br />
    <pre>
....         
elif [[ $host = lcflow ]]
then
  mpirun -np $ii a.out  < runfile_atmos  $ROPTS
elif ....
    </pre>
<br />
to
<br />
    <pre>
....         
elif [[ $host = lcflow ]]
then
  mpirun -np $ii valgrind -v --leak-check=full --log-file="valgrind.out.%q{PMI_RANK}" a.out  < runfile_atmos  $ROPTS
elif ....
    </pre>
<br />
The runtime of the program heavily increases (factor 10 or more). The program valgrind will now write files ''valgrind.out.XX'' for each MPI process in the temporary working directory of PALM. Please don't forget to deploy the scripts again with ''mbuild -u -h lcflow''.
<br />'''Note:''' [[valgrind]] offers a huge number of different debugging tools by command line options. For using other checkers you have to adjust the code line above.
  </li>
  <li>  start the job with ''mrun'' and the additional option ''-B'' to avoid deleting of the temporary working directory (and hence the output of valgrind).</li>
  <li>  analyze the output of valgring (e.g. search for ''invalid write'')</li>
  </li>
</ol>


== Tutorials ==
== Tutorials ==
Line 25: Line 124:
* [[Media:PALM_Seminar_ForWind_04_2012_Day1_01_PALM_introduction_timetable.ppt|Introduction]]
* [[Media:PALM_Seminar_ForWind_04_2012_Day1_01_PALM_introduction_timetable.ppt|Introduction]]
* [[Media:PALM_Seminar_ForWind_04_2012_Day1_02_PALM_overview.ppt|Overview]]
* [[Media:PALM_Seminar_ForWind_04_2012_Day1_02_PALM_overview.ppt|Overview]]
* [[Media:PALM_Seminar_ForWind_04_2012_Day1_03_PALM_installation_on_FLOW_090312.ppt|Installation on FLOW]] (Please see above for actual installation rules)
* [[Media:PALM_Seminar_ForWind_04_2012_Day1_03_PALM_installation_on_FLOW_090312.ppt|Installation on FLOW]] (Please see above for updated installation rules!)
* [[Media:PALM_Seminar_ForWind_04_2012_Day1_05_Introduction_to_NCL_ForWind_2012.ppt|Introduction to NCL]]
* [[Media:PALM_Seminar_ForWind_04_2012_Day1_05_Introduction_to_NCL_ForWind_2012.ppt|Introduction to NCL]]



Latest revision as of 16:41, 15 December 2015

The software PALM is a large-eddy simulation (LES) model for atmospheric and oceanic flows developed at the Institute of Meteorology and Climatology of the Leibniz Universität Hannover.

Installation

Please follow the detailed instructions given in the following pdf-document:

SGE scripts

With recent PALM versions (revision 1100 or newer) PALM jobs are submitted from the local computer. SGE scripts will be generated automatically, so you don't need to create an SGE script by yourself.

If you use an older PALM version than revision 1100 (which is not recommended!), a sample SGE script for submitting PALM jobs can be found here:

Please copy the sample script to your working directory (as palm.sge or <different-name>.sge). For carrying out the test run (to verify the installation), the script does not need to be modified. Please see the old installation guide for instructions on how to modify the script for different runs.

Submitting PALM jobs

PALM jobs are submitted from your local computer with the script mrun. A typical mrun call looks like this:

 mrun -z -d <job name> -h lcflow -K parallel -c ".mrun.config.forwind_flow" -X <number of slots> -t <CPU time in s> -r "d3# <output file list>"

<output file list> can be one or several of the following strings (separated by blanks): "3d#" (3d data), "xy#", "xz#", "yz#" (cross sections), "ma#" (masked data), "pr#" (profiles), "ts#" (time series), "sp#" (spectra). If you want to restart jobs or use turbulent inflow, the output of binary data for restarts can be switched on by simply adding "restart" to the output file list. For a restart run, all "#" have to be replaced by "f". A run with turbulent inflow (which uses data of a precursor run for initialization) requires an "rec". Example: The mrun call for a run with turbulent inflow and desired output of 3d data, profiles and time series as well as binary data for possible restarts would look like this:

 mrun -z -d example2 -h lcflow -K parallel -c ".mrun.config.forwind_flow" -X 144 -t 86400 -r "d3# rec 3d# pr# ts# restart"

In this case, the job "example2" will run on 144 slots (= 12 cores) for 24 hours.
In the submission process you will be asked to specify the memory demand in MBytes. If the memory demand is higher than 1850 MB, the job can only run on the high memory nodes. For most PALM jobs, 1850 MB should be sufficient.

Runtime estimation

The runtime of PALM (which is needed for the SGE script and for mrun) can be estimated by

where the constant is approximately

This value is a first guess from a sample of simulation data. However, this number might have to be corrected in the future. It depends on additional parameters as amount of output data and complexity of user-defined code.

The number of points is defined by the product of the grid points in x-, y- and z-direction

The number of iterations can be calculated by

with the physical simulation time and the timestep size . The timestep size can (in most cases) be estimated by the Courant-Friedrichs-Levy like criteria

where L and N are the length of the simulated domain and resolution in x-, y- and z-direction, respectively. The velocity is the maximal windspeed of the simulation.

Note: In the time estimation the scaling is assumed to be linear which is not true for large number of used CPU cores and small resolutions ( points/core). In this case the constant could be larger.

Known issues

  • When you've made changes in the .mrun_config don't forget to run mbuild once again after adjusting the scripts by
mbuild -u -h lcflow
mbuild -h lcflow
Before doing that, you should delete the folder MAKE_DEPOSITORY on the target system (e.g. FLOW) and the *.x and *.o files in the folders trunk/SCRIPTS and trunk/UTIL on both FLOW and your local computer.
  • With the Intel Compiler 12.0.0 the compiler flag -no-prec-div and -np-prec-sqrt can lead to different results for same runs. Please don't use these flags. Note that the flags will automatically be set when using the compiler option -fast. In this case you should set -prec-div and -prec-sqrt.
  • When submitting PALM jobs from your local computer, job-protocols are sometimes not transferred back to the local host via scp. In this case, they remain in the job_queue-folder on FLOW.

Debugging of PALM

Sometimes it is necessary to debug the code, especially when using an own user code. Here are some hints to debug PALM when running parallel

  1. The simplest way is to add print statements in the user code, at least in the beginning and at the end of each procedure. However, this method is in many cases not very useful.
  2. Usage debug symbols within the executable. This is necessary for most of the debuggers. For this you have to add the compiler options -g and -traceback to the definition %fopts and %lopts in the .mrun_config file. Maybe you have to reduce the optionization level (compiler options -O3, -Ofast, -align all, -ftz, -fno-alias, -no-scalar-rep, -no-prec-sqrt, -ip, -ipo) to -O2 to get the right output in the debugger. Don't forget to build the code again (see Known issues).
  3. To enable additional checks (e.g. array bounds) during the runtime please add the compiler option -check to the definition %fopts and %lopts in the .mrun_config file. Note that the code will run slower. This option is only useful for debugging but not for normal runs. And don't forget to build the code again (see Known issues).
  4. Usage of the debug tool valgrind. This module enables different checks of the code (see valgrind), especially the check of invalid memory usage. To use this tool please do following steps:
    1. add valgrind to the definition %modules in the .mrun_config file.
    2. add compiler option -g (see above)
    3. modify the script mrun

      ....           
      elif [[ $host = lcflow ]]
      then
        mpirun -np $ii a.out  < runfile_atmos  $ROPTS
      elif ....
           


      to

      ....           
      elif [[ $host = lcflow ]]
      then
        mpirun -np $ii valgrind -v --leak-check=full --log-file="valgrind.out.%q{PMI_RANK}" a.out  < runfile_atmos  $ROPTS
      elif ....
           


      The runtime of the program heavily increases (factor 10 or more). The program valgrind will now write files valgrind.out.XX for each MPI process in the temporary working directory of PALM. Please don't forget to deploy the scripts again with mbuild -u -h lcflow.
      Note: valgrind offers a huge number of different debugging tools by command line options. For using other checkers you have to adjust the code line above.

    4. start the job with mrun and the additional option -B to avoid deleting of the temporary working directory (and hence the output of valgrind).
    5. analyze the output of valgring (e.g. search for invalid write)

    Tutorials

    Here are slides from the last training at ForWind in April 2012.

    Day 1

    Day 2

    Day 3


    External Links