Difference between revisions of "FDTD Solutions / Lumerical 2016"

From HPC users
Jump to navigationJump to search
Line 6: Line 6:


Since late 2019, FDTD-Solutions is part of the software suite ''Lumerical''. <br/>
Since late 2019, FDTD-Solutions is part of the software suite ''Lumerical''. <br/>
For this reason the module 'Lumerical' must be loaded to use FDTD-Solutions in the newer versions.  
For this reason the module 'Lumerical' must be loaded to use FDTD-Solutions in the newer versions.
 
Also, since Lumerical/2021-R2.1-2779 all former installed versions were not able to communicate to the updated license server anymore. This is why we had to delete all previous modules from the cluster. For archiving reasons you can look up the now deleted modules down below.


The currently installed versions are:
The currently installed versions are:
  On environment ''hpc-uniol-env''
  On environment ''hpc-env/8.3''
'''FDTD_Solutions/8.20.1634'''
  '''Lumerical/2021-R2.1-2779'''
 
   
On environment ''hpc-env/6.4''  
  '''FDTD_Solutions/8.20.1731'''
  '''FDTD_Solutions/8.21.1933'''
'''Lumerical/2019b-rc3''' (FDTD-Version: 8.22.2025)
 
== Using FDTD Solutions GUI ==
== Using FDTD Solutions GUI ==


Line 162: Line 159:


Therefore, the recommendation for optimal performance is to use all the cores on a single or several nodes. In the simple test case, using more than one node is not increasing the performance, however. Better performance can also be optained by using the partition mpcb.p which has CPUs with 16 cores but a faster clock speed (in the test case a performance of 771.802 Mnodes/s was achieved).
Therefore, the recommendation for optimal performance is to use all the cores on a single or several nodes. In the simple test case, using more than one node is not increasing the performance, however. Better performance can also be optained by using the partition mpcb.p which has CPUs with 16 cores but a faster clock speed (in the test case a performance of 771.802 Mnodes/s was achieved).
==Former installed versions==
In September 2021, we had to remove all prevously installed modules because the developer's company changed (to ANSYS) and therefore the licensing process had to be updated which made every former version obsolete. </br>
The formerly installed versions can be looked up here:
On environment ''hpc-uniol-env''
'''FDTD_Solutions/8.20.1634'''
On environment ''hpc-env/6.4''
'''FDTD_Solutions/8.20.1731'''
'''FDTD_Solutions/8.21.1933'''
'''Lumerical/2019b-rc3'''

Revision as of 14:55, 3 September 2021

Introduction

Lumerical FDTD Solutions is a software package for solving 3D Maxwell’s Equations using Finite Difference Time Domain method.


Installed Version

Since late 2019, FDTD-Solutions is part of the software suite Lumerical.
For this reason the module 'Lumerical' must be loaded to use FDTD-Solutions in the newer versions.

Also, since Lumerical/2021-R2.1-2779 all former installed versions were not able to communicate to the updated license server anymore. This is why we had to delete all previous modules from the cluster. For archiving reasons you can look up the now deleted modules down below.

The currently installed versions are:

On environment hpc-env/8.3
Lumerical/2021-R2.1-2779

Using FDTD Solutions GUI

If you need to work with the GUI (graphical user interface) function, it is mandatory to log in correctly. When logging in, you have to add the option -X at your SSH command. The option makes sure that the program's GUI is forwarded to your device.

ssh abcd1234@carl.hpc.uni-oldenburg.de -X

Of course, this means that your device must be able to display graphical elements such as browsers, office programs or the like. When you are logged in correctly, you just have to activate the module and start the program's GUI:

module load hpc-env/6.4   #for the newest version
module load Lumerical     # or load FDTD-Solutions/<version> if you need an older version
fdtd-solutions&

Now FDTD_Solutions should pop up at your display.


But since FDTD-Solutions became Lumerical you can choose between a lot of different programs. To show all programs available, and to start one of them load the environment and module like shown above and type in

launcher

Hint: Should you have trouble with X-forwarding (e.g. GUI is shown only fragmentarily), you could try to log in with Remote Desktop.

Note: The GUI requires a Design license and if you see a license error message the most likely cause is that someone else is using the GUI. The GUI should mainly be used to prepare a .ftd file which is then processes in batch mode (see below). Any calculation started by the GUI will be carried out on the login node. This should only be used for small test cases or to determine the time and memory requirements for a job.

Should you need help getting started, maybe the developer's guide might help.

Using FDTD Solutions in parallel batch mode

The recommended way of using FDTD Solutions is in batch mode on the compute nodes. This can be achieved in several ways (none of which uses the GUI).

The easy way

After you have loaded the module for FDTD Solutions, you can use the command

$ fdtd-run-slurm.sh -n <n> your_model.fsp

where <n> is the number of parallel tasks (default is 8). The file your_model.fsp describes your model and you can add more fsp-files to the command. For each fsp-file, the command will create a job script and submit is to the cluster. The job will then run as soon as the are enough resources available (the command will estimate the required resources (time and memory) for you).

For example, if you after loading the module run the commands

$ cp $EBROOTFDTD_SOLUTIONS/examples/paralleltest.fsp .   # In the new versions, you have to use the variable $EBROOTLUMERICAL
$ fdtd-run-slurm.sh -n 24 paralleltest.fsp

the test case will be executed with 24 parallel tasks (freely distributed across the compute nodes as needed). The results of the simulation are written to the same file as the input fsp file (it seems, so it is probably a good idea to make a copy of that file first), and there is an additional log-file (and a slurm-<jobid>.out).

The script fdtd-run-slurm.sh comes with a number of options, which can be seen from

$ fdtd-run-slurm.sh -h
The calling convention for fdtd-run-slurm.sh is:

fdtd-run-slurm.sh [<options>] fsp1 [fsp2 ... [fspN]]

The arguments are as follows:

 fsp*      An FDTD Solutions project file. One is required, but
           multiple can be specified on one command line

 -n        The number of processes to use for the job(s).
           If no argument is given a default value of 8 is used

 -N        The number of nodes to use for the job(s).
           If no argument is given SLURM will distribute the processes
           as resources are available (may not be optimal).

 -m        The number of processes (tasks) per node to use.
           Exclusive with -n option, if not used the number of processes
           is determined by the value given with -n.

 -p        The partition to use for the job(s).
           If no argument is given, the default partition carl.p is used.

 -h        Print this help. No job is started

This allows you to pass the most important Slurm parameters. For example, it recommended to use

$ fdtd-run-slurm.sh -N 1 -m 24 paralleltest.fsp

to use a single compute node with 24 processes as this might improve performance (see below).

The expert way

Alternatively, you can just write your own job script (instead of the automatically generated one). This allows you to better control how the job is run on the cluster and maybe use additional options for FDTD Solutions.

A good start is to use an automatically created jobs script and modify it as needed. For example, the job script from the above simple example could look like this:

$ cat paralleltest.sh 
#!/bin/bash
# 
# template for integration of FDTD_Solutions with SLURM
# (based on PBS template provided by Lumerical)
# 
# created 26/07/2018 (SH@UOL)

#SBATCH --partition carl.p
#SBATCH --license   fdtd:1

# resources (to be adjusted by master script)
#SBATCH --time         0:34:59
#SBATCH  -n 24
#SBATCH --mem-per-cpu  2000

# reload module commands
module restore
module load hpc-env/6.4
module load FDTD_Solutions/8.20.1731

# job commands
echo "Starting run at: `date`"
echo "Running on $SLURM_JOB_NUM_NODES nodes with $SLURM_NTASKS processors."
MY_PROG=$(which fdtd-engine-mpich2nem)
MPIEXE="$EBROOTFDTD_SOLUTIONS/mpich2/nemesis/bin/mpiexec -binding"
INPUT="paralleltest.fsp"
echo "MPI command:     $MPIEXE"
echo "Engine Command:  $MY_PROG -t 1"
echo "Input File:      $INPUT"
$MPIEXE $MY_PROG -t 1 ./${INPUT}
echo "Job finished at: `date`"
exit

Now you can change the requested resources as needed, e.g. set memory to 5G per CPU. Once the job script is ready, you can submit it with

$ sbatch paralleltest.sh

Performance Considerations

When submitting jobs for FDTD Solutions it might be useful to first test how well it scales on multiple nodes and CPU cores. In general, the performance is likely to be better if fewer nodes are used. Using the simple paralleltest.fsp example, using the command

$ fdtd-run-slurm.sh -N <n> -m <m> paralleltest.fsp

the following results obtained:

Performance in Mndoes/s
Number of tasks (x) Number of nodes (<n>=x; <m>=1) Tasks per node (<n>=1M; <m>=x)
1 (serial job) 99.0484 81.0396
2 136.534 125.156
4 175.177 267.918
8 305.437 381.796
16 257.267 661.545
24 420.983 661.545

As one can see, increasing the number of tasks in general gives a higher overall performance, however the performance per task is smaller for more tasks. This is a typical behaviour for parallel application due the communication overhead. In addition, the very small example is not well suited for benchmarking.

A second point to notice is, that running all tasks on the same node is generally better than running each task on a different node, in particular if the number of tasks is large. The communication overhead is smaller on a single node.

Finally, it is likely that the computation is memory bound (limited by the bandwidth between memory and CPU) which means that other jobs running on the same node may interfere with the performance.

Therefore, the recommendation for optimal performance is to use all the cores on a single or several nodes. In the simple test case, using more than one node is not increasing the performance, however. Better performance can also be optained by using the partition mpcb.p which has CPUs with 16 cores but a faster clock speed (in the test case a performance of 771.802 Mnodes/s was achieved).


Former installed versions

In September 2021, we had to remove all prevously installed modules because the developer's company changed (to ANSYS) and therefore the licensing process had to be updated which made every former version obsolete.

The formerly installed versions can be looked up here:

On environment hpc-uniol-env

FDTD_Solutions/8.20.1634
On environment hpc-env/6.4 
FDTD_Solutions/8.20.1731
FDTD_Solutions/8.21.1933
Lumerical/2019b-rc3