Difference between revisions of "FDTD Solutions / Lumerical 2016"
Schwietzer (talk | contribs) |
Schwietzer (talk | contribs) |
||
Line 6: | Line 6: | ||
Since late 2019, FDTD-Solutions is part of the software suite ''Lumerical''. <br/> | Since late 2019, FDTD-Solutions is part of the software suite ''Lumerical''. <br/> | ||
For this reason the module 'Lumerical' must be loaded to use FDTD-Solutions in the newer versions. | For this reason the module 'Lumerical' must be loaded to use FDTD-Solutions in the newer versions. | ||
Also, since Lumerical/2021-R2.1-2779 all former installed versions were not able to communicate to the updated license server anymore. This is why we had to delete all previous modules from the cluster. For archiving reasons you can look up the now deleted modules down below. | |||
The currently installed versions are: | The currently installed versions are: | ||
On environment ''hpc | On environment ''hpc-env/8.3'' | ||
'''Lumerical/2021-R2.1-2779''' | |||
''' | |||
== Using FDTD Solutions GUI == | == Using FDTD Solutions GUI == | ||
Line 162: | Line 159: | ||
Therefore, the recommendation for optimal performance is to use all the cores on a single or several nodes. In the simple test case, using more than one node is not increasing the performance, however. Better performance can also be optained by using the partition mpcb.p which has CPUs with 16 cores but a faster clock speed (in the test case a performance of 771.802 Mnodes/s was achieved). | Therefore, the recommendation for optimal performance is to use all the cores on a single or several nodes. In the simple test case, using more than one node is not increasing the performance, however. Better performance can also be optained by using the partition mpcb.p which has CPUs with 16 cores but a faster clock speed (in the test case a performance of 771.802 Mnodes/s was achieved). | ||
==Former installed versions== | |||
In September 2021, we had to remove all prevously installed modules because the developer's company changed (to ANSYS) and therefore the licensing process had to be updated which made every former version obsolete. </br> | |||
The formerly installed versions can be looked up here: | |||
On environment ''hpc-uniol-env'' | |||
'''FDTD_Solutions/8.20.1634''' | |||
On environment ''hpc-env/6.4'' | |||
'''FDTD_Solutions/8.20.1731''' | |||
'''FDTD_Solutions/8.21.1933''' | |||
'''Lumerical/2019b-rc3''' |
Revision as of 14:55, 3 September 2021
Introduction
Lumerical FDTD Solutions is a software package for solving 3D Maxwell’s Equations using Finite Difference Time Domain method.
Installed Version
Since late 2019, FDTD-Solutions is part of the software suite Lumerical.
For this reason the module 'Lumerical' must be loaded to use FDTD-Solutions in the newer versions.
Also, since Lumerical/2021-R2.1-2779 all former installed versions were not able to communicate to the updated license server anymore. This is why we had to delete all previous modules from the cluster. For archiving reasons you can look up the now deleted modules down below.
The currently installed versions are:
On environment hpc-env/8.3 Lumerical/2021-R2.1-2779
Using FDTD Solutions GUI
If you need to work with the GUI (graphical user interface) function, it is mandatory to log in correctly. When logging in, you have to add the option -X at your SSH command. The option makes sure that the program's GUI is forwarded to your device.
ssh abcd1234@carl.hpc.uni-oldenburg.de -X
Of course, this means that your device must be able to display graphical elements such as browsers, office programs or the like. When you are logged in correctly, you just have to activate the module and start the program's GUI:
module load hpc-env/6.4 #for the newest version module load Lumerical # or load FDTD-Solutions/<version> if you need an older version fdtd-solutions&
Now FDTD_Solutions should pop up at your display.
But since FDTD-Solutions became Lumerical you can choose between a lot of different programs.
To show all programs available, and to start one of them load the environment and module like shown above and type in
launcher
Hint: Should you have trouble with X-forwarding (e.g. GUI is shown only fragmentarily), you could try to log in with Remote Desktop.
Note: The GUI requires a Design license and if you see a license error message the most likely cause is that someone else is using the GUI. The GUI should mainly be used to prepare a .ftd file which is then processes in batch mode (see below). Any calculation started by the GUI will be carried out on the login node. This should only be used for small test cases or to determine the time and memory requirements for a job.
Should you need help getting started, maybe the developer's guide might help.
Using FDTD Solutions in parallel batch mode
The recommended way of using FDTD Solutions is in batch mode on the compute nodes. This can be achieved in several ways (none of which uses the GUI).
The easy way
After you have loaded the module for FDTD Solutions, you can use the command
$ fdtd-run-slurm.sh -n <n> your_model.fsp
where <n> is the number of parallel tasks (default is 8). The file your_model.fsp describes your model and you can add more fsp-files to the command. For each fsp-file, the command will create a job script and submit is to the cluster. The job will then run as soon as the are enough resources available (the command will estimate the required resources (time and memory) for you).
For example, if you after loading the module run the commands
$ cp $EBROOTFDTD_SOLUTIONS/examples/paralleltest.fsp . # In the new versions, you have to use the variable $EBROOTLUMERICAL $ fdtd-run-slurm.sh -n 24 paralleltest.fsp
the test case will be executed with 24 parallel tasks (freely distributed across the compute nodes as needed). The results of the simulation are written to the same file as the input fsp file (it seems, so it is probably a good idea to make a copy of that file first), and there is an additional log-file (and a slurm-<jobid>.out).
The script fdtd-run-slurm.sh comes with a number of options, which can be seen from
$ fdtd-run-slurm.sh -h The calling convention for fdtd-run-slurm.sh is: fdtd-run-slurm.sh [<options>] fsp1 [fsp2 ... [fspN]] The arguments are as follows: fsp* An FDTD Solutions project file. One is required, but multiple can be specified on one command line -n The number of processes to use for the job(s). If no argument is given a default value of 8 is used -N The number of nodes to use for the job(s). If no argument is given SLURM will distribute the processes as resources are available (may not be optimal). -m The number of processes (tasks) per node to use. Exclusive with -n option, if not used the number of processes is determined by the value given with -n. -p The partition to use for the job(s). If no argument is given, the default partition carl.p is used. -h Print this help. No job is started
This allows you to pass the most important Slurm parameters. For example, it recommended to use
$ fdtd-run-slurm.sh -N 1 -m 24 paralleltest.fsp
to use a single compute node with 24 processes as this might improve performance (see below).
The expert way
Alternatively, you can just write your own job script (instead of the automatically generated one). This allows you to better control how the job is run on the cluster and maybe use additional options for FDTD Solutions.
A good start is to use an automatically created jobs script and modify it as needed. For example, the job script from the above simple example could look like this:
$ cat paralleltest.sh #!/bin/bash # # template for integration of FDTD_Solutions with SLURM # (based on PBS template provided by Lumerical) # # created 26/07/2018 (SH@UOL) #SBATCH --partition carl.p #SBATCH --license fdtd:1 # resources (to be adjusted by master script) #SBATCH --time 0:34:59 #SBATCH -n 24 #SBATCH --mem-per-cpu 2000 # reload module commands module restore module load hpc-env/6.4 module load FDTD_Solutions/8.20.1731 # job commands echo "Starting run at: `date`" echo "Running on $SLURM_JOB_NUM_NODES nodes with $SLURM_NTASKS processors." MY_PROG=$(which fdtd-engine-mpich2nem) MPIEXE="$EBROOTFDTD_SOLUTIONS/mpich2/nemesis/bin/mpiexec -binding" INPUT="paralleltest.fsp" echo "MPI command: $MPIEXE" echo "Engine Command: $MY_PROG -t 1" echo "Input File: $INPUT" $MPIEXE $MY_PROG -t 1 ./${INPUT} echo "Job finished at: `date`" exit
Now you can change the requested resources as needed, e.g. set memory to 5G per CPU. Once the job script is ready, you can submit it with
$ sbatch paralleltest.sh
Performance Considerations
When submitting jobs for FDTD Solutions it might be useful to first test how well it scales on multiple nodes and CPU cores. In general, the performance is likely to be better if fewer nodes are used. Using the simple paralleltest.fsp example, using the command
$ fdtd-run-slurm.sh -N <n> -m <m> paralleltest.fsp
the following results obtained:
Performance in Mndoes/s | ||||||
---|---|---|---|---|---|---|
Number of tasks (x) | Number of nodes (<n>=x; <m>=1) | Tasks per node (<n>=1M; <m>=x) | ||||
1 (serial job) | 99.0484 | 81.0396 | ||||
2 | 136.534 | 125.156 | ||||
4 | 175.177 | 267.918 | ||||
8 | 305.437 | 381.796 | ||||
16 | 257.267 | 661.545 | ||||
24 | 420.983 | 661.545 |
As one can see, increasing the number of tasks in general gives a higher overall performance, however the performance per task is smaller for more tasks. This is a typical behaviour for parallel application due the communication overhead. In addition, the very small example is not well suited for benchmarking.
A second point to notice is, that running all tasks on the same node is generally better than running each task on a different node, in particular if the number of tasks is large. The communication overhead is smaller on a single node.
Finally, it is likely that the computation is memory bound (limited by the bandwidth between memory and CPU) which means that other jobs running on the same node may interfere with the performance.
Therefore, the recommendation for optimal performance is to use all the cores on a single or several nodes. In the simple test case, using more than one node is not increasing the performance, however. Better performance can also be optained by using the partition mpcb.p which has CPUs with 16 cores but a faster clock speed (in the test case a performance of 771.802 Mnodes/s was achieved).
Former installed versions
In September 2021, we had to remove all prevously installed modules because the developer's company changed (to ANSYS) and therefore the licensing process had to be updated which made every former version obsolete.
The formerly installed versions can be looked up here:
On environment hpc-uniol-env
FDTD_Solutions/8.20.1634
On environment hpc-env/6.4 FDTD_Solutions/8.20.1731 FDTD_Solutions/8.21.1933 Lumerical/2019b-rc3