Latest revision as of 15:08, 6 June 2017

This is the HPC-Wiki of the University of Oldenburg

Basic Information

HPC Facilities	Login	User environment	Compiling and linking	Job Management (Queueing) System	Altix UV 100 system	Examples
Overview FLOW HERO HPC Policies Groups Acknowledging FLOW/HERO User Meetings	From University From Home	Usage of module File System / Quotas Shares under Windows Shares under Linux License servers	Basics GNU Compiler Intel Compiler PGI Compiler Open64 Compiler Altix UV 100	Overview Submitting Job requirements Parallel jobs Interactive jobs Commands Job arrays Environment variables Checking the job status (par. jobs) Obtaining details for finished jobs Documentation On Queues and resource allocation	Compiling Submitting Documentation	Brief Introduction to HPC Computing Matlab examples using MDCS MDCS Basic Example (for R2014b and later) HPC Tutorial 2013 HPC Tutorial 2014 HPC Tutorial 2015

Application Software and Libraries

Compiler and Development Tools	Quantum Chemistry	Computational Fluid Dynamics	Mathematics/Scripting	Visualisation	Libraries
debugging git GNU Compiler Intel Compiler Open64 Compiler PGI Compiler profiling scalasca subversion (svn) valgrind	Gaussian 09 MOLCAS MOLPRO NBO ORCA	Ansys FOAMpro Nektar++ Nek 5000 OpenFOAM PALM STAR-CCM++ THETA WRF/WPS	Configuration MDCS (2014b and later) MATLAB Distributing Computing Server Python R STATA	iso99 NCL ncview paraview	BLAS and LAPACK EGSnrc FLUKA GEANT4 Gurobi HDF5 Intel MPI LEDA NetCDF OpenMPI

Courses and Tutorials

Introduction to HPC Courses	Matlab Tutorials	New OS
HPC Introduction October 6-8, 2014 HPC Introduction October 7-9, 2015	Audio Data Processing Using the MEX Compiler	New OS on FLOW

Contact

HPC Resource	EMail
FLOW and HERO Both (in case of vacation)	Stefan.Harfst@uni-oldenburg.de hpcuniol@uni-oldenburg.de

Note: This Wiki is under construction and a preliminary version! Contributions are welcome. Please ask Stefan Harfst (Stefan.Harfst(at)uni-oldenburg.de) for further informations.

Only for editors: Formatting rules for this Wiki

HPC User Wiki 2016

Difference between revisions of "Welcome to the HPC User Wiki of the University of Oldenburg"

Latest revision as of 15:08, 6 June 2017

Basic Information

Application Software and Libraries

Courses and Tutorials

Contact

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Topics

Tools

@@ Line 1: / Line 1: @@
-<div style="text-align:justify">
+__NOTOC__
+__NOEDITSECTION__
+<div style="text-align:justify;">
+<center>
+{| style="text-align:justify;font-size:1.2em;line-height:1.2em;background-color:#eeeeff;" border="1" cellspacing="0"
+|-
+| [[Image:picture_of_nodes.jpg|155px]]
+| [[Image:picture_of_cluster_closed.jpg|70px]]
+| ''This is the HPC-Wiki of the University of Oldenburg''<br>
+| [[Image:picture_of_gpfs.jpg|82px]]
+| [[Image:picture_of_infinyband.jpg|155px]]
+|}
+</center>
-'''Note''': This is a first, '''preliminary''' version (v0.01) of the HPC User Wiki. Its primary purpose is to get you started with our new clusters (FLOW and HERO), enabling you to familiarize with these systems and gather some experience. More elaborate, improved versions will follow, so you may want to check these pages regularly.
+= Basic Information =
+<center>
+{| style="background-color:#eeeeff;" cellpadding="10" border="1" cellspacing="0"
+|- style="background-color:#ddddff;"
+! HPC Facilities
+! Login
+! User environment
+! Compiling and linking
+! Job Management (Queueing) System
+! Altix UV 100 system
+! Examples
+|- valign="top"
+|
+* [[HPC Facilities of the University of Oldenburg| Overview]]
+* [[HPC Facilities of the University of Oldenburg#FLOW| FLOW]]
+* [[HPC Facilities of the University of Oldenburg#HERO| HERO]]
+* [[HPC Policies| HPC Policies]]
+* [[Unix groups| Groups ]]
+* [[Acknowledging_the_HPC_facilities| Acknowledging FLOW/HERO]]
+* [[User Meetings]]
+|
+* [[Logging in to the system#From within the University (intranet) | From University]]
+* [[Logging in to the system#From outside the University (internet) | From Home]]
+|
+* [[User environment - The usage of module| Usage of module]]
+* [[File system| File System / Quotas]]
+* [[Mounting Directories of FLOW and HERO#Windows | Shares under Windows]]
+* [[Mounting Directories of FLOW and HERO#Linux | Shares under Linux]]
+* [[License servers]]
+|
+* [[Compiling and linking|Basics]]
+* [[GNU Compiler]]
+* [[Intel Compiler]]
+* [[PGI Compiler]]
+* [[Open64 Compiler]]
+* [[Using the Altix UV 100 system#Compiling and linking applications| Altix UV 100]]
-= Introduction  =
+|
+* [[SGE Job Management (Queueing) System| Overview]]
+* [[SGE Job Management (Queueing) System#Submitting jobs| Submitting ]]
+* [[SGE Job Management (Queueing) System#Specifying job requirements| Job requirements ]]
+* [[SGE Job Management (Queueing) System#Parallel environments (PEs) | Parallel jobs ]]
+* [[SGE Job Management (Queueing) System#Interactive jobs | Interactive jobs ]]
+* [[SGE Job Management (Queueing) System#Monitoring and managing your jobs | Commands ]]
+* [[SGE Job Management (Queueing) System#Array jobs| Job arrays  ]]
+* [[SGE Job Management (Queueing) System#Environment variables | Environment variables]]
+* [[Brief_Introduction_to_HPC_Computing#Checking_the_status_of_the_job | Checking the job status]] [[Brief_Introduction_to_HPC_Computing#Checking_the_status_of_the_job_2| (par. jobs)]]
+* [[Brief_Introduction_to_HPC_Computing#Details_for_finished_jobs| Obtaining details for finished jobs]]
+* [[SGE Job Management (Queueing) System#Documentation | Documentation]]
+* [[Queues_and_resource_allocation| On Queues and resource allocation]]
+|
+* [[Using the Altix UV 100 system#Compiling and linking applications| Compiling]]
+* [[Using the Altix UV 100 system#Submitting SGE jobs| Submitting]]
+* [[Using the Altix UV 100 system#Documentation| Documentation]]
+|
+* [[Brief Introduction to HPC Computing| Brief Introduction to HPC Computing]]
+* [[Matlab Examples using MDCS| Matlab examples using MDCS]]
+* [[MDCS Basic Example]] (for R2014b and later)
+* [[HPC Tutorial No1| HPC Tutorial 2013]]
+* [[HPC Introduction October 6-8, 2014| HPC Tutorial 2014]]
+* [[HPC Introduction October 7-9, 2015| HPC Tutorial 2015]]
+|-
-Presently, the central HPC facilities of the University of Oldenburg comprise three systems:
-*FLOW ('''F'''acility for '''L'''arge-Scale C'''O'''mputations in '''W'''ind Energy Research)<br> IBM iDataPlex cluster solution, 2232 CPU cores, 6 TB of (distributed) main memory, QDR InfiniBand interconnect.<br>Theoretical peak performance: '''24 TFlop/s'''.
-*HERO ('''H'''igh-'''E'''nd Computing '''R'''esource '''O'''ldenburg)<br>Hybrid system composed of two components:
-**IBM iDataPlex cluster solution, 1800 CPU cores, 4 TB of (distributed) main memory, Gigabit Ethernet interconnect.<br>Theoretical peak performance: '''19.2 TFlop/s'''.
-**SGI Altix UltraViolet shared-memory system ("SMP component"), 120 CPU cores, 640 GB of globally addressable memory, NumaLink5 interconnect<br>Theoretical peak performance: '''1.3 TFlop/s'''.
-*[http://www.csc.uni-oldenburg.de GOLEM]: older, AMD Opteron-based cluster with 390 cores and 800 GB of (distributed) main memory.<br>Theoretical peak performance: 1.6 TFlop/s.
-FLOW and HERO use a common, shared storage system (high-performance NAS Cluster) with a net capacity of 130 TB.
-FLOW is used for computationally demanding CFD calculations in wind energy research, conducted by the Research Group [http://twist.physik.uni-oldenburg.de/en/index.html TWiST] (Turbulence, Wind Energy, and Stochastis) and the [http://www.forwind.de/forwind/index.php?article_id=1&clang=1 ForWind] Center for Wind Energy Research. It is, to the best of our knowledge, the largest system in Europe dedicated solely to that purpose.
-The main application areas of the HERO cluster are Quantum Chemistry, Theoretical Physics, the Neurosciences, and Audiology. Besides that, the system is used by many other research groups of the [http://www.fk5.uni-oldenburg.de Faculty of Mathematics and Science] and the [http://www.informatik.uni-oldenburg.de Department of Informatics] of the School of Computing Science, Business Administration, Economics, and Law.
-= Hardware Overview  =
-== FLOW  ==
-*122 "low-memory" compute nodes: IBM dx360 M3, dual socket (Westmere-EP, 6C, 2.66 GHz), 12 cores per server, 24 GB DDR3 RAM, diskless (host names <tt>cfdl001..cfdl122</tt>).
-*64 "high-memory" compute nodes: IBM dx360 M3, dual socket (Westmere-EP, 6C, 2.66 GHz), 12 cores per server, 48 GB DDR3 RAM, diskless (host names <tt>cfdh001..cfdh064</tt>).
-*QDR InfiniBand interconnect (fully non-blocking), 198-port Mellanox IS5200 IB switch (can be extended up to 216 ports).
-*Gigabit Ethernet for File-I/O etc.
-*10/100 Mb/s Ethernet for management and administrative tasks (IPMI).
-== HERO  ==
-*130 "standard" compute nodes: IBM dx360 M3, dual socket (Westmere-EP, 6C, 2.66 GHz), 12 cores per server, 24 GB DDR3 RAM, 1 TB SATAII disk (host names <tt>mpcs001..mpcs130</tt>).
-*20 "big" compute nodes: IBM dx360 M3, dual socket (Westmere-EP, 6C, 2.66 GHz), 12 cores per server, 48 GB DDR3 RAM, RAID 8 x 300 GB 15k SAS (host names <tt>mpcb001..mpcb020</tt>)
-*Gigabit Ethernet II for communication of parallel jobs (MPI, LINDA, ...).
-*Second, independent Gigabit Ethernet for File-I/O etc.
-*10/100 Mb/s Ethernet for management and administrative tasks (IPMI).
-*SGI Altix UV 100 shared-memory system, 10 CPUs (Nehalem-EX, "Beckton", 6C, 2.66 GHz), 120 cores in total, 640 GB DDR3 RAM, NumaLink5 interconnect, RAID 20 x 600 GB SAS 15k rpm (host <tt>uv100</tt>).
-The 1 Gb/s leaf switches have uplinks to a 10 Gb/s backbone (two switches, redundant). The central management interface of both clusters runs on two master nodes (IBM x3550 M3) in an HA setup. Each cluster has two login nodes (IBM x3550 M3).
-Operating system: '''Scientific Linux 5.5'''
-Cluster management software: '''Bright Cluster Manager 5.1''' by [http://www.clustervision.com ClusterVision B.V.]
-= Basic Usage  =
-== Logging in to the system  ==
-=== From within the University (intranet)  ===
-Within the internal net of the University, access to the systems is granted via ssh. Use your favorite ssh client like OpenSSH, PuTTY, etc. For example, on a UNIX/Linux system, users of FLOW may type on the command line (replace "abcd1234" by your own account):
- ssh abcd1234@flow.hpc.uni-oldenburg.de
-Similarly, users of HERO login by typing:
- ssh abcd1234@hero.hpc.uni-oldenburg.de
-Use "<tt>ssh -X</tt>" for X11 forwarding (i.e., if you need to export the graphical display to your local system).
-For security reasons, access to the HPC systems is denied from certain subnets. In particular, you cannot login from the WLAN of the University (uniolwlan) or from "public" PCs (located, e.g., in Libraries, PC rooms, or at other places).
-=== From outside the University (internet)  ===
-First, you have to establish a VPN tunnel to the University intranet. After that, you can login to HERO or FLOW via ssh as described above. The data of the tunnel are:
- Gateway      &nbsp;: vpn2.uni-oldenburg.de
- Group name   &nbsp;: hpc-vpn
- Group password: hqc-vqn
-Cf. the [http://www.itdienste.uni-oldenburg.de/21240.html instructions] of the IT Services on how to configure the Cisco VPN client. For the HPC systems, a separate VPN tunnel has been installed, which is only accessible for users of FLOW and HERO. Therefore, you have to configure a new VPN connection and enter the data provided above. For security reasons, you cannot login to FLOW or HERO if you are connected to the intranet via the "generic" VPN tunnel of the University.
-== User Environment  ==
-We use the module environment, which has a lot of advantages, is very flexible (and user-friendly), and even allows one to use different versions of the same software concurrently on the same system. You can see a list of all available modules by typing
- module avail
-To load a given module:
- module load <name of the module>
-The modules system uses a hierarchical file structure, i.e., sometimes (whenever there are ambiguities) you may have to specify a path, as in:
- module load fftw2/gcc/64/double
-To revert all changes made by a given module (environment variables, paths, etc.):
- module unload <name of the module>
-== Compiling and linking  ==
-This section will be elaborated later and then provide much more detailed information. For the time being, we only give a '''very''' brief overview.
-The following compilers and MPI libraries are currently available:
-* GCC, the GNU Compiler Collection: <tt>gcc</tt> Version 4.3.4<pre>module load gcc</pre>This module is loaded per default if you log in to the system.Supported MPI libraries: OpenMPI, MPICH, MPICH2, MVAPICH and MVAPICH2
-* Intel Cluster Studio 2011, formerly known as Intel Cluster Toolkit Compiler Edition (contains the ''Math Kernel Library'' and other performance libraries, analyzer, and HPC tools):<pre>module load intel/ics</pre>The environment for the Intel MPI library must be loaded separately:<pre>module load intel/impi</pre>The Fortran compuler is invoked by <tt>ifort</tt>, and the C/C++ compiler by <tt>icc</tt>. However, if one wants to build MPI applications, one should generally use the wrapper scripts <tt>mpif77</tt>, <tt>mpif90</tt>, <tt>mpicc</tt>, ...
-* PGI Cluster Development Kit, Version 11.3: contains a suite of Fortran and C/C++ compiler as well as various other tools (MPI debugger etc.):<pre>module load pgi</pre>. Invoked by <tt>pgf77</tt>, <tt>pgf95</tt>, ... and </tt><tt>pgcc</tt>, <tt>pgcpp</tt>, ... for FORTRAN and C/C++, respectively. Again, wrapper scripts exist for building MPI applications.<br>Supported MPI libraries: MPICH, MPICH2, and MVAPICH.
-(At the moment, MPICH and MPICH2 have problems running under the queueing system and thus their use is not recommended, but that problem will be fixed soon.)
-Is planned to extend the MPI support for various compilers. In particular, OpenMPI will soon be available for the Intel compiler, too.
-==== Documentation  ====
-*[http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/fortran/lin/index.htm Intel Fortran compiler User and Reference Guides]
-*[http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/cpp/lin/index.htm Intel C/C++ Compiler]
-*[http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/start/lin/cpp/index.htm Intel Getting started tutorial]
-*[http://software.intel.com/sites/products/documentation/hpc/mkl/userguides/mkl_userguide_lnx/index.htm Intel Math Kernel Library User's Guide]
-[http://www.pgroup.com/doc/pgiug.pdf PGI User's Guide (PDF)]
-== Job Management (Queueing) System  ==
-The queueing system employed to manage user jobs for FLOW and HERO is [http://wikis.sun.com/display/GridEngine/Home Sun Grid Engine] (SGE). For first-time users (especially those acquainted with PBS-based systems), some features of SGE may seem a little unusual and certainly need some getting-accustomed-to. In order to efficiently use the available hardware resources (so that all users may benefit the most from the system), a basic understanding of how SGE works is indispensable. Some of the points to keep in mind are the following:
-*Unlike other (e.g., PBS-based) queueing systems, SGE does not "know" the concept of "nodes" with a fixed number of CPUs (cores) and users specifying the number of nodes they need, along with the number of CPUs per node, in their job requirements. Instead, SGE logically divides the cluster into '''slots''', where, roughly speaking, each "slot" may be thought of as a single CPU core (although there are notable exceptions to this rule, see the parallel environment <tt>linda</tt> below. The scheduler assigns free slots to pending jobs. Since in the multi-core area each host offers many slots, this will, in general, lead to jobs of different users running concurrently on the same host (provided that there are sufficient resources like memory, disk space etc. to meet all requirements of all jobs, as specified by the users who submitted them) and usually guarantees efficient resource utilization.
-*While the scheduling behavior described above may be very efficient in optimally using the available hardware resources, it will have undesirable effects on parallel (MPI, LINDA, ...) jobs. E.g., an MPI job requesting 24 slots could end up running 3 tasks on one host, 12 tasks on another host (fully occupying this host, if it is a server with 2 six-core CPUs, as happens with our clusters), and 9 tasks on a third host. Clearly, such an unbalanced configuration may lead to problems. For certain jobs (multithreaded applications) it is even mandatory that all slots reside on one host (typical examples: OpenMP programs, Gaussian single-node jobs).<br> To deal with the specific demands of parallel jobs, SGE offers so-called '''parallel environments (PEs)''' which are largely configurable. Even if your job does not need several hosts, but runs on only one host using several or all cores of that host, you '''must''' specify a parallel environment. '''It is of crucial importance to choose the "correct" parallel environment''' (meeting the requirements of your application/program) when submitting a parallel job.
-*Another "peculiarity" of SGE (as compared to its cousins) are the concepts of '''cluster queues''' and '''queue instances'''. Cluster queues are composed of several (typically, many) queue instances, with each instance associated with one particular host. A cluster queue may have a name like, e.g., ''standardqueue.q'', where the .q suffix is a commonly followed convention. Then the queue instances of this queue has names like, e.g. ''standardqueue.q@host001'', ''standardqueue.q@host002'', ... (note the "@" which acts as a delimiter between the queue name and the queue instance).<br> In general, each host will hold several queue instances belonging to different cluster queues. E.g. there may be a special queue for long-running jobs and a queue for shorter jobs, both of which share the same "physical" machines but have different policies. To avoid oversubscription, resource limits can be configure for individual hosts. Since resource limits and other, more complex attributes can also be associated with cluster queues and even queue instances, the system is highly flexible and can be customized for specified needs. On the other hand, the configuration quickly tends to get rather complex, leading to unexpected side effects. E.g., PEs grab slots from all queue instances of all cluster queues they are associated with. Thus, a parallel job may occupy slots on one particular host belonging to different queue instances on that host. While this is usually no problem for the parallel job itself, it blocks resources in both cluster queues which may be unintended. For that reason, it is common practice to associate each PE with one and only one cluster queue and define several (possibly identically configured) PEs in order to avoid that a single PE spans several cluster queues.
-==== Submitting jobs  ====
-Sample job submission scripts for both serial and parallel jobs are provided in the subdirectory <tt>Examples</tt> of your homedirectory. You may have to adapt these scripts as needed. Note that a job submission script consists of two principal parts:
-*SGE directives (lines starting with the "magic" characters <tt>#$</tt>), which fall into three categories:
-**general options (which shell to use, name of the job, name of output and error files if differing from default, etc.). The directives are passed to the <tt>qsub</tt> command when the job is submitted.
-**Resource requirements (introduced by the <tt>-l</tt> flag), like memory, disk space, runtime (wallclock) limit, etc.
-**Options for parallel jobs (parallel environment, number of job slots, etc.)
-*Commands to be executed by the job (your program, script, etc.), including the necessary set-up of the environment for the application/program to run correctly (loading of modules so that your programs find the required runtime libraries, etc.).
-The job is submitted by the <tt>qsub</tt> command, e.g. (assuming your submission script is named"<tt>myprog.sge</tt>):
- qsub myprog.sge
-==== Specifying job requirements  ====
-The general philosophy behind SGE is that you should not submit your job to a specific queue or queue instance (although this is possible in principle), but rather define your requirements, and then let SGE decide which queue matches them best (taking into account the current load of the system and other factors). For this "automatic" queue selection to work efficiently and in order to avoid wasting of valuable resources (e.g., requesting much more memory than your job needs, which may prevent the scheduling of jobs of other users), it is important that you give a complete and precise specification of your job requirements in your submission script. The following points are relevant to both serial and parallel jobs.
-===== Runtime =====
-Maximum (wallclock) runtime is specified by <tt>h_rt=&lt;hh:mm:ss&gt;</tt>. E.g., a maximum runtime of three days is requested by:
-<pre>
-#$ -l h_rt=72:0:0
-</pre>
-The default runtime of a job is 0:0:0. Thus you should always specify a runtime, unless it is a very short job.
-All cluster queues except the "long" queues have a maximum allowed runtime of '''8 days'''. It is highly recommended to specify the runtime of your job as realistically as possible (leaving, of course, a margin of error). If the scheduler knows that, e.g., a pending job is a "fast run" which needs only a few hours of walltime, it is likely that it will start executing much earlier than other jobs with more extensive walltime requirements (so-called '''backfilling''').
-If your job needs more than 8 days of walltime, your submission script must contain the following line:
-<pre>
-#$ -l longrun=true
-</pre>
-It is then automatically directed to one of the "long" queues, which have no runtime limit. The number of long-running jobs per user is limited.
-===== Memory =====
-Maximum memory (physical + virtual) usage of a job is defined by the <tt>h_vmem</tt> attribute, as in
-<pre>
-#$ -l h_vmem=4G
-</pre>
-for a job requesting 4 GB of total memory. If your job exceeds the specified memory limit, it gets killed automatically. The default value for <tt>h_vmem</tt> is 500 MB.
-'''Important''': The <tt>h_vmem</tt> attribute refers to the memory '''per job slot''', i.e. it gets multiplied by the number of slots for a parallel job.
-Total memory available for jobs on each compute node:
-<br>
-<ul>
-<li>standard compute nodes of HERO (<tt>mpcs001..mpcs130</tt>): 23 GB
-<li>big nodes of HERO (<tt>mpcb001..mpcb020</tt>): 46 GB
-<li>low-memory nodes of FLOW (<tt>cfdl001..cfdl122</tt>): 22 GB
-<li>high-memory nodes of FLOW (<tt>cfdh001..cfdh064</tt>): 46 GB
-</ul>
-If your job needs one (or several) of the "big nodes" of HERO (<tt>mpcb001..mpcb020</tt>), you must specify your memory requirement '''and''' set the Boolean attribute <tt>bignode</tt> to <tt>true</tt>. Example: A job in the parallel environment "smp" (see below) requests 12 slots and 3 GB per slot (i.e., <tt>h_vmem=3G</tt>). This jobs needs 36 GB of memory on a single node in total, and thus can only run on one of the big nodes. The corresponding section of your submission script will then read:
-<pre>
-#$ -l h_vmem=3G
-#$ -l bignode=true
-</pre>
-Similarly, to request one of the high-memory nodes of FLOW (<tt>cfdh001..cfdh064</tt>), you need to set the attribute <tt>highmen</tt> to <tt>true</tt>. Example: for an MPI job with 12 tasks per node and a memory requirement of 3 GB for each task, you would specify:
-<pre>
-#$ -l h_vmem=3G
-#$ -l highmem=true
-</pre>
-===== Local disk space (HERO only) =====
-Local scratch space is only available on the HERO cluster, since the compute nodes of FLOW are diskless.
-For requesting, e.g., 200 GB of scratch space, the SGE directive reads:
-<pre>
-#$ -l h_fsize=200G
-</pre>
-The default value is <tt>h_fsize=100M</tt>.
-The path to the local scratch directory can be accessed in your job script (or other scripts/programs invoked by your job) via the <tt>$TMPDIR</tt> environment variable. After termination of your job (or if you kill your job manually by <tt>qdel</tt>), the scratch directory is automatically purged.
-Total amount of scratch space available on each compute node:
-<ul>
-<li> standard nodes (<tt>mps001..mpcs130</tt>): 800 GB
-<li> big nodes (<tt>mpcb001..mpcb020</tt>): 2100 GB
-</ul>
-If your job needs more than 800 GB of scratch space, you must request one of the big nodes. Example:
-<pre>
- #$ -l h_fsize=1400G
- #$ -l bignode=true
-</pre>
-==== Parallel environments (PEs)  ====
-'''Example''': If you have an MPI program compiled and linked with the Intel Compiler and MPI library,
-your job submission script might look like follows:
-<pre>
- #$ -pe intelmpi 96
- #$ -R y
- load module intel/impi
- mpiexec -machinefile $TMPDIR/machines -n $NSLOTS -env I_MPI_FABRICS shm:ofa ./myprog_intelmpi
-</pre>
-In that case, the MPI job uses the InfiniBand fabric for communication (the I_MPI_FABRICS variable).
-Turning on resource reservation (<tt>-R y</tt>) is highly recommended in order to avoid starving of parallel jobs by serial jobs which "block" required slots on specific hosts.The job requests 96 cores. The allocation rule of this PE is "fill-up", i.e. SGE tries to place the MPI tasks on as few hosts as possible (in the "ideal" case, the program would run on exactly 8 hosts (with cores or slots on each host, but there is no guerantee that this is going to happen).
-Please have a look at the directory named <tt>Examples</tt> in your homedirectory, which contains other examples how to submit parallel (MPI) jobs.
-List of all currently available PEs:
-*<tt>intelmpi</tt> for using the Intel MPI Library, see above.
-*<tt>openmpi</tt> for using the OpenMPI Library (so far, only supported with the <tt>gcc</tt> compiler
-*<tt>mvapich</tt> for MVAPICH library (i.e., InfiniBand interconnects)
-*<tt>smp</tt>: this PE demands that '''all''' requested slots be on the same host (needed for multithreaded applications, like Gaussian single-node jobs, OpenMP, etc.)
-*<tt>linda</tt>: special PE for Gaussian Linda jobs, see below.
-If your job is to run in one of the "long" queues (i.e., requesting more than 8 days of walltime), you must use the corresponding "long" version of the PE: intelmpi_long, openmpi_long, etc.
-Note that the above list will grow over time. E.g., it is planned to support OpenMPI with the Intel Compiler (not only the <tt>gcc</tt> compiler, as is now the case).
-... tbc ...
-<br>
-==== Interactive jobs  ====
-Interactive jobs are only allowed for members of certain groups from the Institue of Psychology who have special data pre-processing needs which require manual intervention and cannot be automatized (the prerequesite for writing a batch job script).
-Users who are entitled to submit interactive jobs type
- qlogin -l xtr=true
-on the command line ("xtr" means "extra queue"). After that, a graphical Matlab session can be started by issuing the
-following two commands:
- module load matlab
- matlab &
-(Sending the Matlab process to the background gives you control over the shell, which may be useful.
-If you do not specify any memory requirements, your interactive job will be limited to using at most 500MB. If you need more (e.g., 2 GB), you have to request the memory explicitly, as in:
- qlogin -l xtr=true -l h_vmem=2G
-Note that the syntax is the same as for requesting resource requirements in job submission script (a resource request starts with the "-l" flag).
-==== Monitoring and managing your jobs  ====
-A selection of the most frequently used commands for job monitoring and management:
-*<tt>qstat</tt>: display all (pending, running, ...) jobs of the user (output is empty if user has no jobs in the system).
-*<tt>qstat -j <jobid></tt>: get a more verbose output, which is particularly useful when analyzing why your job won't run.
-*<tt>qdel <jobid></tt>: kill job with specified ID (users can, of course, only kill their own jobs).
-*<tt>qalter</tt>: Modify a pending or running job.
-*<tt>qhost</tt>: display state of all hosts.
-Note that there is also a GUI to SGE, invoked by the command <tt>qmon</tt>
-... tbc ...
-==== Array jobs ====
-... are a very efficient way of managing your jobs under certain circumstances (e.g., if you have to run one identical program many times on different data sets, with different initial conditions, etc.). Please see the corresponding [http://wikis.sun.com/display/GridEngine/Submitting+Extended+Jobs+and+Advanced+Jobs#SubmittingExtendedJobsandAdvancedJobs-SubmittingArrayJobs Section] in the official documentation of Sun Grid Engine
-...tbc...
-=== Documentation  ===
-* [http://wikis.sun.com/display/GridEngine/Using+Sun+Grid+Engine Sun Grid Engine User's Guide]
-== Using the Altix UV 100 system  ==
-The SGI system is used for very specific applications (in need of a large and highly performant shared-memory system) and can presently only be accessed by the Theoretical Chemistry group. Entitled users may login to the system via <tt>ssh</tt> (the same rules as for the login nodes of the main system apply, i.e. access is only possible from within the intranet of the University, otherwise you have to establish a VPN tunnel):
- abcd1234@uv100.hpc.uni-oldenburg.de
-The Altix UV system has a RHEL 6.0 operating system installed.
-As for the IBM cluster, the modules environment is used.
-=== Compiling and linking applications ===
-It is strongly recommended to use MPT (Message Passing Toolkit), SGI's own implementation of the MPI standard. It is only then that the highly specialized HW architecture of the system can be fully exploited.
-The MPT module must be loaded both for compiling and (in general) at runtime in order for you application to find the dynamically linked libraries:
- module load mpt
-Note that MPT is not a compiler. SGI does not provide own compilers for x86-64 based systems. One may, e.g., use the Intel compiler:
- module load intel/ics
-Basically, you can use the compiler the same way you are accustomed to, and link against the MPT library by setting the flag <tt>-lmpi</tt>. See the documentation provided below, and also for how to run MPI programs.
-=== Submitting SGE jobs ===
-Only entitled users can submit jobs to the SGI Altix UV system (the jobs of other users would never start). For a job that is to execute on the UV100 system, your SGE job submission script must contain the line
- #$ -l uv100=true
-From SGE's point of view, the system is treated as a single, big SMP node with 120 cores, 630 GB of main memory available for running jobs, and 11 TB of scratch disk space. So far only the parallel environment <tt>smp</tt> is configured on the system.
-=== Documentation ===
-* [http://techpubs.sgi.com/library/manuals/3000/007-3773-003/sgi_html/index.html MPT User's Guide]
-*[http://docs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=bks&srch=&fname=/SGI_Developer/LX_86_AppTune/sgi_html/front.html Application Tuning Guide for SGI X86-64 Based Systems]
-(Both of the above are also available as PDF download.)
+|}
+</center>
 = Application Software and Libraries =
-== Quantum Chemistry ==
+<center>
+{| style="background-color:#eeeeff;" cellpadding="10" border="1" cellspacing="0"
-=== Gaussian 09 ===
+|- style="background-color:#ddddff;"
+!Compiler and Development Tools
-==== Single-node (multi-threaded) jobs ====
+!Quantum Chemistry
+!Computational Fluid Dynamics
-You have to use the parallel environment <tt>smp</tt> to ensure that your job runs on a single host.
+!Mathematics/Scripting
-The following example illustrates this for a Gaussian job using 12 processors (CPU cores):
+!Visualisation
-<pre>
+!Libraries
-#$ -l h_vmem=1900MB
+|- valign="top"
-#$ -l h_fsize=500G
+|
+* [[debugging]]
-#$ -pe smp 12
+* [[git]]
-#$ -R y
+* [[GNU Compiler]]
+* [[Intel Compiler]]
-module load gaussian
+* [[Open64 Compiler]]
-g09run myinputfile
+* [[PGI Compiler]]
-</pre>
+* [[Profiling_using_gprof| profiling]]
+* [[scalasca]]
-The total amount of memory reserved for the job is 12 <tt>x</tt> 1900 MB = 22.8 GB (remember that for parallel jobs, the value of <tt>h_vmem</tt> is multiplied by the number of slots), which is close to the maximum memory available on a standard compute node of HERO (23 GB). If you requested less than 12 slots, the remaining slots may be filled by jobs of other users (provided there is enough memory and other resources available). Of course, you may also need to reserve sufficient local disk (scratch space) for your job (in the above example, 500 GB are
+* [[subversion (svn)]]
+* [[valgrind]]
-The Gaussian input file <tt>myinputfile</tt> of the above example would then contain, e.g., the following lines in the link 0 section:
-<pre>
-%Mem=21000MB
-%NProcShared=12
-</pre>
-'''Important''': Memory management is critical for the performance of Gaussian jobs. Which parameter values are optimal is highly dependent on the type of the calculation, the system size, and other factors. Therefore, optimizing your Gaussian job with respect to memory allocation almost always requires (besides experience) some trial and error. The following general remarks may be useful:
-<ul>
-<li>In the above example, we have told Gaussian to use almost all of the total memory reserved for the job (22.8 GB), leaving only a small margin of 1.8 GB which is necessary, among others, since the G09 executables are rather large and have to be resident in memory (a margin of about 1 GB should be sufficient in most cases). This is usually a good choice for '''DFT''' calculations.
-<li> For '''MP2''' calculations, on the other hand, Gaussian requests about twice the amount of memory specified by the <tt>Mem=...</tt> directive. If this total (physical + virtual) memory requested by Gaussian is lower than the memory reserved for the job via the SGE <tt>h_vmem=...</tt> directive, the process stays in main memory. If it exceeds the memory reserved for the job, the operating systems starts swapping, which may lead to a dramatic performance decrease. In that case, you may significantly speed up your calculation by giving Gaussian access to only half of the total memory reserved for the job, i.e., in the above example, a good starting point for a MP2 calculation would be:
-<pre>
-%Mem=11000MB
-%NProcShared=12
-</pre>
-In any case, as mentioned above, testing and some trial and error are indispensable and well worth the effort!
-</ul>
-You may also want to check the [http://www.gaussian.com/g_tech/g_ur/m_eff.htm Efficiency Considerations] website of Gaussian Inc.
-==== Linda jobs ====
-For Gaussian multi-node (Linda) jobs, use the <tt>linda</tt> parallel environment (PE). The PE <tt>linda</tt> behaves quite different than the other PEs, since "slot" here means "the entire node", i.e. one "slot" represents 12 CPU cores. Moreover, to ascertain that each Linda worker has exclusive access to the corresponding node (no jobs of other users running on the same node), it is necessary to set the <tt>excl</tt> attribute to <tt>true</tt>.
-'''Example''': For a Linda job requesting four nodes (Linda workers) and 22 GB of memory per node, the relevant section of the submission script would be:
-<pre>
-#$ -l h_vmem=22G
-#$ -pe linda 4 -l excl=true
-#$ -R y
-module load gaussian
-g09run myinputfile
-</pre>
-The link 0 section of the input file <tt>myinputfile</tt> would then, e.g., contain the following lines:
-<pre>
-%LindaWorkers=
-%NProcShared=12
-%Mem=20000MB
-</pre>
-As for single-node jobs, you should carefully consider memory allocation. In the above example, we simply tell Gaussian that it can use all the memory reserved for the job on each node (allowing for a overhead of 2 GB), which may not be the optimal choice in all cases (see above).
-For Linda jobs, the "<tt>%LindaWorkers=</tt>" directive is mandatory. The wrapper script parses the input file looking for the <tt>LindaWorkers</tt> keyword (anything after the "=" will be ignored) and, if found, fills in the correct node list. Note that the <tt>%NProcl</tt> directive of older Gaussian versions is deprecated and should no longer be used.
-'''Important notes''':
-<ul>
-<li>Not all types of Gaussian calculations support Linda. Please check, by consulting the manual or submitting short (!) test jobs, if your Gaussian calculation runs under Linda.
-<li>The efficiency of Linda jobs depends on the type of calculation, the system size, and many other factors. Of course, the remarks concerning memory management apply to Linda jobs as well. '''Please invest a little time in testing and, in particular, check the scaling of your Linda job''', it may later save you a lot of work and speed up your calculations significantly. This can be done by running a (Linda capable) job first on a single node, then on 2, and 4 nodes. On two nodes, your job should (ideally!) run twice as fast, and on four nodes four times as fast.
-It does not make much sense to run a Linda job on four nodes if you "only" gain a speed-up of a factor 3 or less, since that would waste the resources of (at least) one compute node!
-</ul>
-<br>
-=== MOLCAS  ===
-The <span class="pops">[http://www.molcas.org MOLCAS]</span> package, developed by a team of researchers at Lund University and around the world, is a suite of programs designed for accurate ''ab-initio'' calculations of ground and excited states of electronic systems. It has been especially tailored for situations with highly degenerate states, and also allows for geometry optimizations. Due to licensing restrictions, MOLCAS is not available for all users.
-==== How to submit a MOLCAS job  ====
-In order to submit and run a MOLCAS job via SGE, you basically have to do three things:
-* Select the parallel environment (PE) <tt>molcas</tt> (or <tt>molcas_long</tt> if your job runs more than 8 days, in which case you must also add the directive <tt>#$ -l longrun=true</tt>), and specify the number of slots; e.g., if your MOLCAS job is supposed to run on 16 cores, your SGE submission script would contain the directive: <pre>#$ -pe molcas 16</pre>
-* Load the <tt>molcas</tt> environment module, i.e., add the line <pre>module load molcas</pre> to your job submission script
-* Start the MOLCAS calculation as usual; e.g., the last lines of your SGE script might look as follows:<pre>export MOLCASMEM="1024"&#13;export MOLCASDISK="100000"&#13;&#13;export Project="MyProject"&#13;export Outfile="MyProject_${JOB_ID}.out"&#13;&#13;cp -p ${Project}.inp ${WorkDir}&#13;cd ${WorkDir}&#13;&#13;molcas ${Project}.inp > ${OutFile} 2>&1&#13;&#13;exit 0</pre>
-Note that it is always a good idea to change into the (auto-generated) MOLCAS working directory <tt>${WorkDir}</tt> before starting the actual calculation; that way, all scratch files will go to the local disk and not pollute your home directory (here, the directory <tt>${PWD}</tt> the job was submitted from). On the other hand, we save the output of the <tt>molcas</tt> command (a text file which usually is not very large) directly into the homedirectory. This may be useful for error analysis if the calculation fails, while everything in <tt>${WorkDir}</tt> will automatically be erased once the job has finished (or crashes). If you need to keep files from <tt>${WorkDir}</tt> for later usage, you have to manually copy them to your homedirectory (i.e., after the <tt>molcas</tt> command but before the <tt>exit</tt> command in your SGE script.
-It is extremely important that you '''must not, under any circumstances, change the value of the <tt>WorkDir</tt> environment variable (i.e., the location of the working directory of MOLCAS'''). The working directory is set by the <tt>molcas</tt> environment module, and if you change it, your MOLCAS job is likely to crash. You can access the working directory as usual, copy files to and from it (cf. the above example script), but please do never change the location of the working directory itself!
-==== How to configure a (parallel) MOLCAS job  ====
-The proper configuration of a MOLCAS job is a highly non-trivial task. Learning how to do this is important not only for the sake of optimizing the performance of your own jobs, but also in order to ensure an '''efficient usage of our HPC resources'''. A wrong configuration of your MOLCAS jobs may lead to an unnecessary waste of resources which are then unavailable for other users.
-First, note that every MOLCAS job is, by definition, a "parallel" job, i.e., you '''must''' specify the PE <tt>molcas</tt>. A "serial" MOLCAS job is a parallel job running on a single slot (<tt>#$ -pe molcas 1</tt>).
-As a second, preliminary remark, you should be aware that only a subset of all MOLCAS modules is fully parallelized. The following modules are known to parallelize rather well (this list may be incomplete,
-and change over time since MOLCAS is under active development and the parallel features are rather new):
-:SEWARD
-:SCF
-:RASSCF
-:ALASKA
-:MCKINLEY
-:MCLR
-The other modules may be run in parallel, but you won't obtain any speed-up. However, it is possible that you get a speed-up in a parallel run even for some of the other modules (e.g., CASPT2) if a certain computational task can be trivially parallelized (divided into independent sub-tasks). This seems to be the case, e.g., for a numerical gradients calculation (according to the MOLCAS developers).
-Properly configuring your MOLCAS job basically requires the following three steps:
-* '''Step 1:''' Perform a number of test runs under well-defined, reproducible conditions, starting with a single core, then using multiple cores of a single node, then running your job across nodes.
-* '''Step 2:''' Carefully analyze the results and try to figure out what determines the performance of your job (often with MOLCAS, it's I/O)
-* '''Step 3:''' Taking the conclusions of Step 2 into account, customize the SGE directives in your submission script accordingly
-<br>
-'''Example'''
-We illustrate the above procedure with a "real-world" example, i.e., the calculation of an
-'''excited state of a CO molecule adsorbed on one of the edges of a Buckminsterfullerene (C60) molecule'''
-(courtesy of J. Mitschker).
-This job makes of the SEWARD and RASSCF modules and is rather I/O and memory intensive (as is characteristic for
-many MOLCAS calculations), writing about 30 GB of scratch files to the local disk and doing a lot of disk writes
-and, in particular, reads.
-The value of the MOLCASMEM variable has been set to 1024 (corresponding to 1 GB per core).
-Note that as to the virtual memory (per core) that you reserve for your SGE job using the <tt>h_vmem</tt>
-attribute, you will have to factor in a significant overhead in order to avoid your job running out of memory.
-How large this overhead is, depends on the actual calculation and must be found by trial and error.
-An overhead by a factor of 1.5 - 2 is usually enough.
-Increasing the value of MOLCASMEM significantly does not speed-up the calculation, but in contrast leads
-to a performance degradation (contrary to what one may naively expect).
-On a single core (on a node on which no other job is running) the above job takes
-<center>'''8 h 50 min'''</center>
-to complete.
-We now perform the three steps necessary for finding the correct configuration of the job as outlined above:
-* '''Step 1:''' Test runs<br> To run the job on a single node on 1, 2, 4, 6, 8, ... cores, we set the <tt>exclusive</tt> attribute to <tt>true</tt>, and specify the request the corresponding number of slots (<tt>h_vmem</tt> will be chosen to be <tt>1900M</tt>). E.g., to run the job on 4 cores of a single node, the relevant lines of the submission script are<pre>#$ -l h_vmem=1900M&#13;&#13;#$ -l exclusive=true&#13;&#13;#$ -pe molcas 4&#13;</pre> The exclusive node reservation ensures that no other job will be running on that same node.<br>Now, to run the job across nodes, we again take advantage of exclusive node reservation and adjust the value of <tt>h_vmem</tt> to control how many cores per node are actually used by the job. Recall that a standard, mpcs-class node has <tt>23G</tt> (<tt>23000M</tt>) of memory available to user jobs. Thus, in order for your job to use
-**'''exactly 2 cores per node''', choose <tt>h_vmem=11500M</tt> (or any value &gt; <tt>5750M</tt> and &le; <tt>11500M</tt>),
-**'''exactly 4 cores per node''', choose <tt>h_vmem=5750M</tt> (or any value &gt; <tt>2875M</tt> and &le; <tt>5750M</tt>),
-**'''...'''
-<ul>The total number of jobs requested then determines how many nodes will be used. E.g., if you request 24 slots and choose <tt>h_vmem=5750M</tt>, your job will be running on 6 nodes using exactly 4 cores per node.</ul>
-* '''Step 2:''' Analyzing results<br> The following figure shows the speed-up obtained for various configurations (single node, across nodes with varying number of cores per node) as a function of the total number of cores, N:<br>[[File:MOLCAS_Parallel_FullereneTestCase.jpg]]<br> The speed-up is defined by<br><center>speed-up = runtime on a single core / runtime on N cores</center><br> The results show that
-** it makes no sense to use more than about 4 cores per node (if the job runs on more than 4 cores per node, the speed-up curves become flatter, such that the performance by the additional cores does not justify the waste of resources),
-** the main performance gain is obtained if the job is spread across nodes, since that way the I/O load is distributed more evenly (memory bandwidth may also play a role).
-<ul>The contention of MPI tasks fighting for memory and, in particular, I/O bandwidth may be the reason for the poor behavior if more than 4 cores per node are used. Clearly, the behavior of this job is dominated by I/O, and in order to optimize the performance, the I/O load must be spread across many nodes. It is also clear that (at least for this calculation) MOLCAS does not parallelize to more than a few dozen of cores. Therefore, it makes no sense to request more than, say, a total of 32 slots in this example.</ul>
-* '''Step 3:''' Customizing job submission script<br> The above conclusions suggest that the job should be configured as follows:
-** No more than 4 cores per node should be used, thus we may set, e.g., <tt>h_vmem=5000M</tt>.
-** The total number of cores should not be larger than 32 (e.g., 16 or 24 would be a decent choice); here we take the number of cores to be 32.
-<ul>An important question is whether we should use exclusive node reservation as for the test runs. In fact, using exclusive node reservation would be a waste of resources and should therefore '''not''' be used. The job would always claim 8 nodes, but use only a fraction of the cores (32 out of 96) and of the total memory available. This would be '''inefficient use of cluster resources'''! A better alternative is to omit the <tt>#$ -l exclusive=true</tt> directive, and let the queueing system distribute the job "freely" across the cluster (i.e., SGE will try to allocate free cores, not free nodes). This may even improve the performance since the job is likely to get distributed across more than 8 nodes (using, e.g., only one or two cores on most of the nodes, while all or part of the remaining cores are occupied by other jobs).</ul>
-<ul>In general, other jobs will then run on the same nodes, too. This is of no harm (neither for the MOLCAS job, nor for the other jobs), there is only one point to keep in mind. If there were another I/O intensive job (e.g., another MOLCAS job) running on the same node, this would adversely affect the performance of '''all''' jobs on the node. Avoiding such a situation is not trivial, but a simple measure is to request a large amount of disk space (e.g. <tt>700G</tt>), which usually should keep other I/O intensive jobs at bay (recall that a standard, mpcs-class node has <tt>800G</tt> of requestable disk space).</ul>
-<ul>Summarizing, the relevant lines of the SGE submission script may hence look as follows:<pre>#$ -l h_vmem=5000M&#13;&#13;#$ -l h_fsize=700G&#13;&#13;#$ -pe molcas 32</pre></ul>
-<br>
-Using the above configuration, the job (in a "typical" run) took
-<center>'''40 min'''</center>
-to complete and ran on 20 nodes (using between 1 and 2 cores per node).
-All nodes were shared among jobs of different users.
-Note that this is a '''speed-up by a factor of more than 13''' compared to the single-core calculation and an impressive demonstration what can be achieved by properly configuring your MOLCAS (and other parallel) jobs,
-while at the same time leaving a minimal footprint and using the HPC resources in a very efficient manner!
-A final remark: the '''CASPT2 module''' is even much more memory and I/O intensive than the SEWARD and RASSCF used in the above job. If the above RASSCF calculation is supplemented by a CASPT2 calculation (for this, the values
-of MOLCASMEM and <tt>h_vmem</tt> must be increased significantly), the job takes
-<center>'''47 h'''</center>
-to complete on a single core (there is no speed-up by using more than one core, since CASPT2 itself is not parallelized).
-If, on the other hand, the same job runs on one of the big nodes (this requires the directive <tt>#$ -l bignode=true</tt> instead of the standard-class nodes, the job completes in a whopping
-<center>'''9 h'''</center>
-This performance gain by a factor more than 5 is due to the high-performance, SAS disk array which the every big node is equipped with. Clearly, such I/O intensive jobs like MOLCAS CASPT2 calculations strongly benefit from this I/O performance. The big nodes should therefore '''only''' be used for such highly specialized tasks, and not for the usual run-of-the-mill jobs.
-==== Known issues  ====
-* Parallelization in MOLCAS is relatively recent, and you should not be too surprised if a parallel jobs fails. Errors may also come from the <span class="pops">[http://www.emsl.pnl.gov/docs/global/ Global Arrays Toolkit]</span>, on which all of the parallelization of MOLCAS rests. Running the job on a single core usually helps.
-* The CASPT2 module sometimes (depending on the job) fails to run across nodes (it does run on multiple cores of a single node, though). This should not be too much of a problem, since CASPT2 is not fully parallelized, and running the module in parallel does not yield a performance gain anyway (cf. the remark above).
-==== Documentation  ====
-There is detailed [http://www.hpc.uni-oldenburg.de/MOLCAS/MOLCAS-7.6_Manual.pdf User Manual] (over 500 pages)
-as well as a [http://www.hpc.uni-oldenburg.de/MOLCAS/MOLCAS-7.6_ShortGuide.pdf Short Guide].
-Those who want to dig deeper into the inner workings of MOLCAS may consult the [http://www.hpc.uni-oldenburg.de/MOLCAS/MOLCAS-7.6_ProgrammingGuide.pdf Programming Guide].
-<br>
-=== MOLPRO ===
-Coming soon ...
-... tbc ...
-<br>
-== MATLAB ==
-To submit a MATLAB job, you must first load the environment module in your submission script:
- module load matlab
-This automatically loads the newest version, if several versions are installed.
-After that, invoke MATLAB in batch mode:
- matlab -nojvm -nodisplay -r mymatlab_input
-where <tt>mymatlab_input.m</tt> (a so-called "'''.m-file'''") is an ASCII text file
-containing the sequence of MATLAB commands that you would normally enter in an interactive session.
-=== Documentation ===
-* [http://www.mathworks.de/help/releases/R2010b/techdoc/learn_matlab/bqr_2pl.html Getting Started Guide] (also available [file:MATLAB_GettingStarted_R2010b.pdf as a printable PDF])
-<br>
-== LEDA (''L''ibrary of ''E''fficient ''D''ata types and ''A''lgorithms) ==
+|
+* [[Gaussian 09]]
+* [[MOLCAS]]
+* [[MOLPRO]]
+* [[NBO]]
+* [[ORCA]]
+|
+* [[Ansys]]
+* [[FOAMpro]]
+* [[Nektar++]]
+* [[Nek 5000]]
+* [[OpenFOAM]]
+* [[PALM]]
+* [[STAR-CCM++]]
+* [[THETA]]
+* [[WRF/WPS]]
-To set the correct paths and environment variables load the appropriate module:
+|
+* [[Configuration MDCS]] (2014b and later)
+* [[MATLAB Distributing Computing Server]]
+* [[Python]]
+* [[R]]
+* [[STATA| STATA]]
+|
+* [[iso99]]
+* [[NCL]]
+* [[ncview]]
+* [[paraview]]
+|
+* [[BLAS and LAPACK]]
+* [[EGSnrc]]
+* [[FLUKA]]
+* [[GEANT4]]
+* [[Gurobi]]
+* [[HDF5]]
+* [[Intel MPI]]
+* [[LEDA]]
+* [[NetCDF]]
+* [[OpenMPI]]
-module load leda/6.3
+|-
-for the single-threaded version, or
+|}
+</center>
-module load leda/6.3-mt
+= Courses and Tutorials =
-in the case of the multi-threaded library.
+<center>
+{| style="background-color:#eeeeff;" cellpadding="10" border="1" cellspacing="0"
+|- style="background-color:#ddddff;"
+!Introduction to HPC Courses
+!Matlab Tutorials
+!New OS
+|- valign="top"
+|
+* [[HPC Introduction October 6-8, 2014]]
+* [[HPC Introduction October 7-9, 2015]]
+|
+* [[Audio Data Processing]]
+* [[Using the MEX Compiler]]
+|
+* [[media:New_OS_On_FLOW.pdf | New OS on FLOW ]]
+|-
-== CFD (Computational Fluid Dynamics) ==
+|}
+</center>
-=== STAR-CCM+ ===
-For further information, see the <span class="pops">[http://www.hpc.uni-oldenburg.de/starccmplus/online Online Documentation]</span> (only accessible internally).
+= Contact =
-<br>
+<center>
-<br>
+{| style="background-color:#eeeeff;" cellpadding="10" border="1" cellspacing="0"
+|- style="background-color:#ddddff;"
+!HPC Resource
+!EMail
+|- valign="top"
+|
+FLOW and HERO<br>
+Both (in case of vacation)<br>
+|
+Stefan.Harfst@uni-oldenburg.de<br>
+hpcuniol@uni-oldenburg.de<br>
+|-
+|}
+</center>
-= Advanced Usage  =
-Here will you will find, among others, hints how to analyse and optimize your programs using HPC tools (profiler, debugger, performance libraries), and other useful information.
+'''''Note:''' This Wiki is under construction and a preliminary version! Contributions are welcome. Please ask Stefan Harfst (Stefan.Harfst(at)uni-oldenburg.de) for further informations.''
-... tbc ...
+<center>
+''Only for editors: [[Formatting rules for this Wiki]]''
+</center>
 </div>
+[[HPC User Wiki 2016]]