Difference between revisions of "PALM"
(→Day 1) |
|||
(42 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
The software PALM is a | The software PALM is a large-eddy simulation (LES) model for atmospheric and oceanic flows developed at the [http://www.muk.uni-hannover.de Institute of Meteorology and Climatology] of the Leibniz Universität Hannover. | ||
== Installation == | == Installation == | ||
Please | Please follow the detailed instructions given in the following pdf-document: | ||
* [[Media: | * [[Media: PALM_installation_on_FLOW_2015_01_30_bwi.pdf|Installation of PALM on FLOW]] | ||
== SGE | == SGE scripts == | ||
'''With recent PALM versions''' (revision 1100 or newer) PALM jobs are submitted from the local computer. SGE scripts will be generated automatically, so '''you don't need to create an SGE script by yourself'''. | |||
<br /> | |||
<br /> | |||
'''If you use an older PALM version than revision 1100 (which is not recommended!)''', a sample SGE script for submitting PALM jobs can be found here: | |||
* [[Palm.sge|palm.sge]] | |||
Please copy the sample script to your working directory (as palm.sge or <different-name>.sge). For carrying out the test run (to verify the installation), the script does not need to be modified. Please see the [[Media:PALM installation on FLOW 090113.pdf|old installation guide]] for instructions on how to modify the script for different runs. | |||
== Submitting PALM jobs == | |||
PALM jobs are submitted from your local computer with the script mrun. A typical mrun call looks like this: | |||
mrun -z -d <job name> -h lcflow -K parallel -c ".mrun.config.forwind_flow" -X <number of slots> -t <CPU time in s> -r "d3# <output file list>" | |||
<output file list> can be one or several of the following strings (separated by blanks): "3d#" (3d data), "<tt>xy#</tt>", "<tt>xz#</tt>", "<tt>yz#</tt>" (cross sections), "<tt>ma#</tt>" (masked data), "<tt>pr#</tt>" (profiles), "<tt>ts#</tt>" (time series), "<tt>sp#</tt>" (spectra). If you want to restart jobs or use turbulent inflow, the output of binary data for restarts can be switched on by simply adding "<tt>restart</tt>" to the output file list. For a restart run, all "<tt>#</tt>" have to be replaced by "<tt>f</tt>". A run with turbulent inflow (which uses data of a precursor run for initialization) requires an "<tt>rec</tt>". | |||
'''Example:''' The mrun call for a run with turbulent inflow and desired output of 3d data, profiles and time series as well as binary data for possible restarts would look like this: | |||
mrun -z -d example2 -h lcflow -K parallel -c ".mrun.config.forwind_flow" -X 144 -t 86400 -r "d3# rec 3d# pr# ts# restart" | |||
In this case, the job "''example2''" will run on 144 slots (= 12 cores) for 24 hours. | |||
<br /> | |||
In the submission process you will be asked to specify the memory demand in MBytes. If the memory demand is higher than 1850 MB, the job can only run on the high memory nodes. For most PALM jobs, 1850 MB should be sufficient. | |||
== Runtime estimation == | |||
The runtime of PALM (which is needed for the SGE script and for mrun) can be estimated by | |||
<center><math> | |||
T_{CPU}[\mbox{s}] = c_{PALM,FLOW}\frac{N_{Iterations}\cdot N_{Points}}{N_{CPU}} | |||
</math></center> | |||
where the constant <math>c_{PALM,FLOW}</math> is approximately | |||
<center><math> | |||
c_{PALM,FLOW} \approx 8\cdot 10^{-6}\mbox{s} | |||
</math></center> | |||
This value is a first guess from a sample of simulation data. However, this number might have to be corrected in the future. It depends on additional parameters as amount of output data and complexity of user-defined code. | |||
The number of points is defined by the product of the grid points in ''x''-, ''y''- and ''z''-direction | |||
<center><math> | |||
N_{Points} = N_x \cdot N_y \cdot N_z | |||
</math></center> | |||
The number of iterations can be calculated by | |||
<center><math> | |||
N_{Iterations} = \frac{T_{total}}{\Delta t} | |||
</math></center> | |||
with the physical simulation time <math>T_{total}</math> and the timestep size <math>\Delta t</math>. The timestep size <math>\Delta t</math> can (in most cases) be estimated by the Courant-Friedrichs-Levy like criteria | |||
<center><math> | |||
\Delta t = \frac{\max\left(\Delta x\right)}{2\bar u_{max}} =\frac{\max\left(\frac{L_x}{N_x},\frac{L_y}{N_y}, \frac{L_z}{N_z}\right)}{2\bar u_{max}} | |||
</math></center> | |||
where ''L'' and ''N'' are the length of the simulated domain and resolution in ''x''-, ''y''- and ''z''-direction, respectively. The velocity <math>\bar u_{max}</math> is the maximal windspeed of the simulation. | |||
'''Note:''' In the time estimation the scaling is assumed to be linear which is not true for large number of used CPU cores and small resolutions (<math>O(10^5)</math> points/core). In this case the constant could be larger. | |||
== Known issues == | == Known issues == | ||
* With the Intel Compiler 12.0.0 the compiler flag ''-no-prec-div'' and ''-np-prec-sqrt'' can lead to different results for same runs. Please don't | * When you've made changes in the ''.mrun_config'' don't forget to run ''mbuild'' once again after adjusting the scripts by | ||
<pre> | |||
mbuild -u -h lcflow | |||
mbuild -h lcflow | |||
</pre> | |||
:Before doing that, you should delete the folder <tt>MAKE_DEPOSITORY</tt> on the target system (e.g. FLOW) and the *.x and *.o files in the folders <tt>trunk/SCRIPTS</tt> and <tt>trunk/UTIL</tt> on both FLOW and your local computer. | |||
* With the Intel Compiler 12.0.0 the compiler flag ''-no-prec-div'' and ''-np-prec-sqrt'' can lead to different results for same runs. Please don't use these flags. Note that the flags will automatically be set when using the compiler option ''-fast''. In this case you should set ''-prec-div'' and ''-prec-sqrt''. | |||
* When submitting PALM jobs from your local computer, job-protocols are sometimes not transferred back to the local host via scp. In this case, they remain in the job_queue-folder on FLOW. | |||
== Debugging of PALM == | |||
Sometimes it is necessary to debug the code, especially when using an own ''user code''. Here are some hints to debug PALM when running parallel | |||
<ol> | |||
<li> The simplest way is to add print statements in the user code, at least in the beginning and at the end of each procedure. However, this method is in many cases not very useful. | |||
</li> | |||
<li> Usage debug symbols within the executable. This is necessary for most of the debuggers. For this you have to add the compiler options ''-g'' and ''-traceback'' to the definition ''%fopts'' and ''%lopts'' in the ''.mrun_config'' file. Maybe you have to reduce the optionization level (compiler options ''-O3'', ''-Ofast'', ''-align all'', ''-ftz'', ''-fno-alias'', ''-no-scalar-rep'', ''-no-prec-sqrt'', ''-ip'', ''-ipo'') to ''-O2'' to get the right output in the debugger. Don't forget to build the code again (see [[PALM#Known_issues | Known issues]]). | |||
</li> | |||
<li> To enable additional checks (e.g. array bounds) during the runtime please add the compiler option ''-check'' to the definition ''%fopts'' and ''%lopts'' in the ''.mrun_config'' file. Note that the code will run slower. This option is only useful for debugging but not for normal runs. And don't forget to build the code again (see [[PALM#Known_issues | Known issues]]). | |||
</li> | |||
<li> Usage of the debug tool ''valgrind''. This module enables different checks of the code (see [[valgrind]]), especially the check of invalid memory usage. To use this tool please do following steps: | |||
<ol style="list-style-type:lower-roman"> | |||
<li> add ''valgrind'' to the definition ''%modules'' in the ''.mrun_config'' file.</li> | |||
<li> add compiler option ''-g'' (see above)</li> | |||
<li> modify the script ''mrun''<br /><br /> | |||
<pre> | |||
.... | |||
elif [[ $host = lcflow ]] | |||
then | |||
mpirun -np $ii a.out < runfile_atmos $ROPTS | |||
elif .... | |||
</pre> | |||
<br /> | |||
to | |||
<br /> | |||
<pre> | |||
.... | |||
elif [[ $host = lcflow ]] | |||
then | |||
mpirun -np $ii valgrind -v --leak-check=full --log-file="valgrind.out.%q{PMI_RANK}" a.out < runfile_atmos $ROPTS | |||
elif .... | |||
</pre> | |||
<br /> | |||
The runtime of the program heavily increases (factor 10 or more). The program valgrind will now write files ''valgrind.out.XX'' for each MPI process in the temporary working directory of PALM. Please don't forget to deploy the scripts again with ''mbuild -u -h lcflow''. | |||
<br />'''Note:''' [[valgrind]] offers a huge number of different debugging tools by command line options. For using other checkers you have to adjust the code line above. | |||
</li> | |||
<li> start the job with ''mrun'' and the additional option ''-B'' to avoid deleting of the temporary working directory (and hence the output of valgrind).</li> | |||
<li> analyze the output of valgring (e.g. search for ''invalid write'')</li> | |||
</li> | |||
</ol> | |||
== Tutorials == | == Tutorials == | ||
Line 25: | Line 124: | ||
* [[Media:PALM_Seminar_ForWind_04_2012_Day1_01_PALM_introduction_timetable.ppt|Introduction]] | * [[Media:PALM_Seminar_ForWind_04_2012_Day1_01_PALM_introduction_timetable.ppt|Introduction]] | ||
* [[Media:PALM_Seminar_ForWind_04_2012_Day1_02_PALM_overview.ppt|Overview]] | * [[Media:PALM_Seminar_ForWind_04_2012_Day1_02_PALM_overview.ppt|Overview]] | ||
* [[Media:PALM_Seminar_ForWind_04_2012_Day1_03_PALM_installation_on_FLOW_090312.ppt|Installation on FLOW]] (Please see above for | * [[Media:PALM_Seminar_ForWind_04_2012_Day1_03_PALM_installation_on_FLOW_090312.ppt|Installation on FLOW]] (Please see above for updated installation rules!) | ||
* [[Media:PALM_Seminar_ForWind_04_2012_Day1_05_Introduction_to_NCL_ForWind_2012.ppt|Introduction to NCL]] | * [[Media:PALM_Seminar_ForWind_04_2012_Day1_05_Introduction_to_NCL_ForWind_2012.ppt|Introduction to NCL]] | ||
Latest revision as of 16:41, 15 December 2015
The software PALM is a large-eddy simulation (LES) model for atmospheric and oceanic flows developed at the Institute of Meteorology and Climatology of the Leibniz Universität Hannover.
Installation
Please follow the detailed instructions given in the following pdf-document:
SGE scripts
With recent PALM versions (revision 1100 or newer) PALM jobs are submitted from the local computer. SGE scripts will be generated automatically, so you don't need to create an SGE script by yourself.
If you use an older PALM version than revision 1100 (which is not recommended!), a sample SGE script for submitting PALM jobs can be found here:
Please copy the sample script to your working directory (as palm.sge or <different-name>.sge). For carrying out the test run (to verify the installation), the script does not need to be modified. Please see the old installation guide for instructions on how to modify the script for different runs.
Submitting PALM jobs
PALM jobs are submitted from your local computer with the script mrun. A typical mrun call looks like this:
mrun -z -d <job name> -h lcflow -K parallel -c ".mrun.config.forwind_flow" -X <number of slots> -t <CPU time in s> -r "d3# <output file list>"
<output file list> can be one or several of the following strings (separated by blanks): "3d#" (3d data), "xy#", "xz#", "yz#" (cross sections), "ma#" (masked data), "pr#" (profiles), "ts#" (time series), "sp#" (spectra). If you want to restart jobs or use turbulent inflow, the output of binary data for restarts can be switched on by simply adding "restart" to the output file list. For a restart run, all "#" have to be replaced by "f". A run with turbulent inflow (which uses data of a precursor run for initialization) requires an "rec". Example: The mrun call for a run with turbulent inflow and desired output of 3d data, profiles and time series as well as binary data for possible restarts would look like this:
mrun -z -d example2 -h lcflow -K parallel -c ".mrun.config.forwind_flow" -X 144 -t 86400 -r "d3# rec 3d# pr# ts# restart"
In this case, the job "example2" will run on 144 slots (= 12 cores) for 24 hours.
In the submission process you will be asked to specify the memory demand in MBytes. If the memory demand is higher than 1850 MB, the job can only run on the high memory nodes. For most PALM jobs, 1850 MB should be sufficient.
Runtime estimation
The runtime of PALM (which is needed for the SGE script and for mrun) can be estimated by
where the constant is approximately
This value is a first guess from a sample of simulation data. However, this number might have to be corrected in the future. It depends on additional parameters as amount of output data and complexity of user-defined code.
The number of points is defined by the product of the grid points in x-, y- and z-direction
The number of iterations can be calculated by
with the physical simulation time and the timestep size . The timestep size can (in most cases) be estimated by the Courant-Friedrichs-Levy like criteria
where L and N are the length of the simulated domain and resolution in x-, y- and z-direction, respectively. The velocity is the maximal windspeed of the simulation.
Note: In the time estimation the scaling is assumed to be linear which is not true for large number of used CPU cores and small resolutions ( points/core). In this case the constant could be larger.
Known issues
- When you've made changes in the .mrun_config don't forget to run mbuild once again after adjusting the scripts by
mbuild -u -h lcflow mbuild -h lcflow
- Before doing that, you should delete the folder MAKE_DEPOSITORY on the target system (e.g. FLOW) and the *.x and *.o files in the folders trunk/SCRIPTS and trunk/UTIL on both FLOW and your local computer.
- With the Intel Compiler 12.0.0 the compiler flag -no-prec-div and -np-prec-sqrt can lead to different results for same runs. Please don't use these flags. Note that the flags will automatically be set when using the compiler option -fast. In this case you should set -prec-div and -prec-sqrt.
- When submitting PALM jobs from your local computer, job-protocols are sometimes not transferred back to the local host via scp. In this case, they remain in the job_queue-folder on FLOW.
Debugging of PALM
Sometimes it is necessary to debug the code, especially when using an own user code. Here are some hints to debug PALM when running parallel
- The simplest way is to add print statements in the user code, at least in the beginning and at the end of each procedure. However, this method is in many cases not very useful.
- Usage debug symbols within the executable. This is necessary for most of the debuggers. For this you have to add the compiler options -g and -traceback to the definition %fopts and %lopts in the .mrun_config file. Maybe you have to reduce the optionization level (compiler options -O3, -Ofast, -align all, -ftz, -fno-alias, -no-scalar-rep, -no-prec-sqrt, -ip, -ipo) to -O2 to get the right output in the debugger. Don't forget to build the code again (see Known issues).
- To enable additional checks (e.g. array bounds) during the runtime please add the compiler option -check to the definition %fopts and %lopts in the .mrun_config file. Note that the code will run slower. This option is only useful for debugging but not for normal runs. And don't forget to build the code again (see Known issues).
- Usage of the debug tool valgrind. This module enables different checks of the code (see valgrind), especially the check of invalid memory usage. To use this tool please do following steps:
- add valgrind to the definition %modules in the .mrun_config file.
- add compiler option -g (see above)
- modify the script mrun
.... elif [[ $host = lcflow ]] then mpirun -np $ii a.out < runfile_atmos $ROPTS elif ....
to
.... elif [[ $host = lcflow ]] then mpirun -np $ii valgrind -v --leak-check=full --log-file="valgrind.out.%q{PMI_RANK}" a.out < runfile_atmos $ROPTS elif ....
The runtime of the program heavily increases (factor 10 or more). The program valgrind will now write files valgrind.out.XX for each MPI process in the temporary working directory of PALM. Please don't forget to deploy the scripts again with mbuild -u -h lcflow.
Note: valgrind offers a huge number of different debugging tools by command line options. For using other checkers you have to adjust the code line above. - start the job with mrun and the additional option -B to avoid deleting of the temporary working directory (and hence the output of valgrind).
- analyze the output of valgring (e.g. search for invalid write)
Tutorials
Here are slides from the last training at ForWind in April 2012.
Day 1
- Fundamentals of LES
- Introduction
- Overview
- Installation on FLOW (Please see above for updated installation rules!)
- Introduction to NCL
Day 2
- Exercise: Neutral boundary layer
- Numerical boundary conditions
- Program control
- Program structure
- Runs with mrun (part 1)
- Runs with mrun (part 2)
Day 3
- Parallelization
- Debugging
- Non-cyclic boundary conditions
- Restarts with mrun
- Interface Exercise
- User defined code
- LES of wake flows
External Links