Difference between revisions of "Welcome to the HPC User Wiki of the University of Oldenburg"

From HPC users
Jump to navigationJump to search
 
(186 intermediate revisions by 8 users not shown)
Line 1: Line 1:
'''Note''': This is a first, '''preliminary''' version (v0.01) of the HPC User Wiki. Its primary purpose is to get you started with our new clusters (FLOW and HERO), enabling you to familiarize with these systems and gather some experience. More elaborate, improved versions will follow, so you may want to check these pages regularly.  
__NOTOC__
__NOEDITSECTION__
<div style="text-align:justify;">
<center>
{| style="text-align:justify;font-size:1.2em;line-height:1.2em;background-color:#eeeeff;" border="1" cellspacing="0"
|-
| [[Image:picture_of_nodes.jpg|155px]]
| [[Image:picture_of_cluster_closed.jpg|70px]]
| ''This is the HPC-Wiki of the University of Oldenburg''<br>
| [[Image:picture_of_gpfs.jpg|82px]]
| [[Image:picture_of_infinyband.jpg|155px]]
|}
</center>


= Introduction  =
= Basic Information =
<center>
{| style="background-color:#eeeeff;" cellpadding="10" border="1" cellspacing="0"
|- style="background-color:#ddddff;"
! HPC Facilities
! Login
! User environment
! Compiling and linking
! Job Management (Queueing) System
! Altix UV 100 system
! Examples
|- valign="top"
|
* [[HPC Facilities of the University of Oldenburg| Overview]]
* [[HPC Facilities of the University of Oldenburg#FLOW| FLOW]]
* [[HPC Facilities of the University of Oldenburg#HERO| HERO]]
* [[HPC Policies| HPC Policies]]
* [[Unix groups| Groups ]]
* [[Acknowledging_the_HPC_facilities| Acknowledging FLOW/HERO]]
* [[User Meetings]]
|
* [[Logging in to the system#From within the University (intranet) | From University]]
* [[Logging in to the system#From outside the University (internet) | From Home]]
|
* [[User environment - The usage of module| Usage of module]]
* [[File system| File System / Quotas]]
* [[Mounting Directories of FLOW and HERO#Windows | Shares under Windows]]
* [[Mounting Directories of FLOW and HERO#Linux | Shares under Linux]]
* [[License servers]]
|
* [[Compiling and linking|Basics]]
* [[GNU Compiler]]
* [[Intel Compiler]]
* [[PGI Compiler]]
* [[Open64 Compiler]]
* [[Using the Altix UV 100 system#Compiling and linking applications| Altix UV 100]]


Presently, the central HPC facilities of the University of Oldenburg comprise three systems:
|
* [[SGE Job Management (Queueing) System| Overview]]
* [[SGE Job Management (Queueing) System#Submitting jobs| Submitting ]]
* [[SGE Job Management (Queueing) System#Specifying job requirements| Job requirements ]]
* [[SGE Job Management (Queueing) System#Parallel environments (PEs) | Parallel jobs ]]
* [[SGE Job Management (Queueing) System#Interactive jobs | Interactive jobs ]]
* [[SGE Job Management (Queueing) System#Monitoring and managing your jobs | Commands ]]
* [[SGE Job Management (Queueing) System#Array jobs| Job arrays  ]]
* [[SGE Job Management (Queueing) System#Environment variables | Environment variables]]
* [[Brief_Introduction_to_HPC_Computing#Checking_the_status_of_the_job | Checking the job status]] [[Brief_Introduction_to_HPC_Computing#Checking_the_status_of_the_job_2| (par. jobs)]]
* [[Brief_Introduction_to_HPC_Computing#Details_for_finished_jobs| Obtaining details for finished jobs]]
* [[SGE Job Management (Queueing) System#Documentation | Documentation]]
* [[Queues_and_resource_allocation| On Queues and resource allocation]]
|
* [[Using the Altix UV 100 system#Compiling and linking applications| Compiling]]
* [[Using the Altix UV 100 system#Submitting SGE jobs| Submitting]]
* [[Using the Altix UV 100 system#Documentation| Documentation]]
|
* [[Brief Introduction to HPC Computing| Brief Introduction to HPC Computing]]
* [[Matlab Examples using MDCS| Matlab examples using MDCS]]
* [[MDCS Basic Example]] (for R2014b and later)
* [[HPC Tutorial No1| HPC Tutorial 2013]]
* [[HPC Introduction October 6-8, 2014| HPC Tutorial 2014]]
* [[HPC Introduction October 7-9, 2015| HPC Tutorial 2015]]
|-


*FLOW ('''F'''acility for '''L'''arge-Scale C'''O'''mputations in '''W'''ind Energy Research)<br> IBM iDataPlex cluster solution, 2232 CPU cores, 6 TB of (distributed) main memory, QDR InfiniBand interconnect (theoretical peak performance: 24 TFlop/s).


*HERO ('''H'''igh-'''E'''nd Computing '''R'''esource '''O'''ldenburg)<br>Hybrid system composed of two components:
|}
**IBM iDataPlex cluster solution, 1800 CPU cores, 4 TB of (distributed) main memory, Gigabit Ethernet interconnect (theoretical peak performance: 19.2 TFlop/s),
</center>
**SGI Altix UltraViolet shared-memory system ("SMP" component), 120 CPU cores, 640 GB of globally addressable memory, NumaLink5 interconnect (theoretical peak performance: 1.3 TFlop/s).


*[http://www.csc.uni-oldenburg.de GOLEM]: older, AMD Opteron-based cluster with 390 cores and 800 GB of (distributed) main memory (theoretical peak performance: 1.6 TFlop/s).
= Application Software and Libraries =


FLOW and HERO use a common, shared storage system (high-performance NAS Cluster) with a net capacity of 130 TB.
<center>
{| style="background-color:#eeeeff;" cellpadding="10" border="1" cellspacing="0"
|- style="background-color:#ddddff;"
!Compiler and Development Tools
!Quantum Chemistry
!Computational Fluid Dynamics
!Mathematics/Scripting
!Visualisation
!Libraries
|- valign="top"
|
* [[debugging]]
* [[git]]
* [[GNU Compiler]]
* [[Intel Compiler]]
* [[Open64 Compiler]]
* [[PGI Compiler]]
* [[Profiling_using_gprof| profiling]]
* [[scalasca]]
* [[subversion (svn)]]
* [[valgrind]]


FLOW is employed for computationally demanding CFD calculations in wind energy research, conducted by the Research Group [http://twist.physik.uni-oldenburg.de/en/index.html TWiST] (Turbulence, Wind Energy, and Stochastis) and the [http://www.forwind.de/forwind/index.php?article_id=1&clang=1 ForWind] Center for Wind Energy Research. It is, to the best of our knowledge, the largest system in Europe dedicated solely to that purpose.
|
* [[Gaussian 09]]
* [[MOLCAS]]
* [[MOLPRO]]
* [[NBO]]
* [[ORCA]]
|
* [[Ansys]]
* [[FOAMpro]]
* [[Nektar++]]
* [[Nek 5000]]
* [[OpenFOAM]]
* [[PALM]]
* [[STAR-CCM++]]
* [[THETA]]
* [[WRF/WPS]]


The main application areas of the HERO cluster are Quantum Chemistry, Theoretical Physics, and the Neurosciences and Audiology. Besides that, the system is used by many other research groups of the [http://www.fk5.uni-oldenburg.de Faculty of Mathematics and Science] and the [http://www.informatik.uni-oldenburg.de Department of Informatics] of the School of Computing Science, Business Administration, Economics, and Law.
|
* [[Configuration MDCS]] (2014b and later)
* [[MATLAB Distributing Computing Server]]
* [[Python]]
* [[R]]
* [[STATA| STATA]]
|
* [[iso99]]
* [[NCL]]
* [[ncview]]
* [[paraview]]
|
* [[BLAS and LAPACK]]
* [[EGSnrc]]
* [[FLUKA]]
* [[GEANT4]]
* [[Gurobi]]
* [[HDF5]]
* [[Intel MPI]]
* [[LEDA]]
* [[NetCDF]]
* [[OpenMPI]]


= Hardware Overview  =
|-


== FLOW  ==
|}
</center>


*122 "low-memory" compute nodes: IBM dx360 M3, dual socket (Westmere-EP, 6C, 2.66 GHz), 12 cores per server, 24 GB DDR3 RAM, diskless (host names <tt>cfdl001..cfdl122</tt>).
= Courses and Tutorials =


*64 "high-memory" compute nodes: IBM dx360 M3, dual socket (Westmere-EP, 6C, 2.66 GHz), 12 cores per server, 48 GB DDR3 RAM, diskless (host names <tt>cfdh001..cfdh064</tt>).
<center>
{| style="background-color:#eeeeff;" cellpadding="10" border="1" cellspacing="0"  
|- style="background-color:#ddddff;"
!Introduction to HPC Courses
!Matlab Tutorials
!New OS
|- valign="top"
|
* [[HPC Introduction October 6-8, 2014]]
* [[HPC Introduction October 7-9, 2015]]
|
* [[Audio Data Processing]]
* [[Using the MEX Compiler]]
|
* [[media:New_OS_On_FLOW.pdf | New OS on FLOW ]]
|-


*QDR InfiniBand interconnect (fully non-blocking), 198-port Mellanox IS5200 IB switch (can be extended up to 216 ports).
|}
</center>


*Gigabit Ethernet for File-I/O etc.


*10/100 Mb/s Ethernet for management and administrative tasks (IPMI).
= Contact =


== HERO  ==
<center>
{| style="background-color:#eeeeff;" cellpadding="10" border="1" cellspacing="0"
|- style="background-color:#ddddff;"
!HPC Resource
!EMail
|- valign="top"
|
FLOW and HERO<br>
Both (in case of vacation)<br>
|
Stefan.Harfst@uni-oldenburg.de<br>
hpcuniol@uni-oldenburg.de<br>
|-
|}
</center>


*130 "standard" compute nodes: IBM dx360 M3, dual socket (Westmere-EP, 6C, 2.66 GHz), 12 cores per server, 24 GB DDR3 RAM, 1 TB SATAII disk (host names <tt>mpcs001..mpcs130</tt>).


*20 "big" compute nodes: IBM dx360 M3, dual socket (Westmere-EP, 6C, 2.66 GHz), 12 cores per server, 48 GB DDR3 RAM, RAID 8 x 300 GB 15k SAS (host names <tt>mpcb001..mpcb020</tt>)
'''''Note:''' This Wiki is under construction and a preliminary version! Contributions are welcome. Please ask Stefan Harfst (Stefan.Harfst(at)uni-oldenburg.de) for further informations.''


*Gigabit Ethernet II for communication of parallel jobs (MPI, LINDA, ...).
<center>
''Only for editors: [[Formatting rules for this Wiki]]''
</center>


*Second, independent Gigabit Ethernet for File-I/O etc.
</div>
 
[[HPC User Wiki 2016]]
*10/100 Mb/s Ethernet for management and administrative tasks (IPMI).
 
*SGI Altix UV 100 shared-memory system, 10 CPUs (Nehalem-EX, "Beckton", 6C, 2.66 GHz), 120 cores in total, 640 GB DDR3 RAM, NumaLink5 interconnect, RAID 20 x 600 GB SAS 15k rpm (host <tt>uv100</tt>).
 
The 1 Gb/s leaf switches have uplinks to a 10 Gb/s backbone (two switches, redundant). The central management interface of both clusters runs on two master nodes (IBM x3550 M3) in an HA setup. Each cluster has two login nodes (IBM x3550 M3).
 
Operating system: '''Scientific Linux 5.5'''
 
Cluster management software: '''Bright Cluster Manager 5.1''' by [http://www.clustervision.com ClusterVision B.V.]
 
= Basic Usage  =
 
== Logging in to the system  ==
 
=== From within the University (intranet)  ===
 
Within the internal net of the University, access to the systems is granted via ssh. Use your favorite ssh client like OpenSSH, PuTTY, etc. For example, on a UNIX/Linux system, users of FLOW may type on the command line (replace "abcd1234" by your own account):
 
ssh abcd1234@flow.hpc.uni-oldenburg.de
 
Similarly, users of HERO login by typing:
 
ssh abcd1234@hero.hpc.uni-oldenburg.de
 
Use "<tt>ssh -X</tt>" for X11 forwarding (i.e., if you need to export the graphical display to your local system).
 
For security reasons, access to the HPC systems is denied from certain subnets. In particular, you cannot login from the WLAN of the University (uniolwlan) or from "public" PCs (located, e.g., in Libraries, PC rooms, or at other places).
 
=== From outside the University (internet)  ===
 
First, you have to establish a VPN tunnel to the University intranet. After that, you can login to HERO or FLOW via ssh as described above. The data of the tunnel are:
 
Gateway      &nbsp;: vpn2.uni-oldenburg.de
Group name  &nbsp;: hpc-vpn
Group password: hqc-vqn
 
Cf. the [http://www.itdienste.uni-oldenburg.de/21240.html instructions] of the IT Services on how to configure the Cisco VPN client. For the HPC systems, a separate VPN tunnel has been installed, which is only accessible for users of FLOW and HERO. Therefore, you have to configure a new VPN connection and enter the data provided above. For security reasons, you cannot login to FLOW or HERO if you are connected to the intranet via the "generic" VPN tunnel of the University.
 
== User Environment  ==
 
We use the module environment, which has a lot of advantages, is very flexible (and user-friendly), and even allows one to use different versions of the same software concurrently on the same system. You can see a list of all available modules by typing
module avail
 
To load a given module:
module load <name of the module>
 
The modules system uses a hierarchical file structure, i.e., sometimes (whenever there are ambiguities) you may have to specify a path, as in:
module load fftw2/gcc/64/double
 
To revert all changes made by a given module (environment variables, paths, etc.):
module unload <name of the module>
 
 
== Compiling and linking  ==
 
This section will be elaborated later and then provide much more detailed information. For the time being, we only give a '''very''' brief overview.
 
The following compilers and MPI libraries are currently available:
 
* GCC, the GNU Compiler Collection: <tt>gcc</tt> Version 4.3.4<pre>module load gcc</pre>This module is loaded per default if you log in to the system.Supported MPI libraries: OpenMPI, MPICH, MPICH2, MVAPICH and MVAPICH2
 
* Intel Cluster Studio 2011, formerly known as Intel Cluster Toolkit Compiler Edition (contains the ''Math Kernel Library'' and other performance libraries, analyzer, and HPC tools):<pre>module load intel/ics</pre>The environment for the Intel MPI library must be loaded separately:<pre>module load intel/impi</pre>The Fortran compuler is invoked by <tt>ifort</tt>, and the C/C++ compiler by <tt>icc</tt>. However, if one wants to build MPI applications, one should generally use the wrapper scripts <tt>mpif77</tt>, <tt>mpif90</tt>, <tt>mpicc</tt>, ...
 
* PGI Cluster Development Kit, Version 11.3: contains a suite of Fortran and C/C++ compiler as well as various other tools (MPI debugger etc.):<pre>module load pgi</pre>. Invoked by <tt>pgf77</tt>, <tt>pgf95</tt>, ... and </tt><tt>pgcc</tt>, <tt>pgcpp</tt>, ... for FORTRAN and C/C++, respectively. Again, wrapper scripts exist for building MPI applications.<br>Supported MPI libraries: MPICH, MPICH2, and MVAPICH.
(At the moment, MPICH and MPICH2 have problems running under the queueing system and thus their use is not recommended, but that problem will be fixed soon.)
 
Is planned to extend the MPI support for various compilers. In particular, OpenMPI will soon be available for the Intel compiler, too.
 
==== Documentation  ====
 
*[http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/fortran/lin/index.htm Intel Fortran compiler User and Reference Guides]
 
*[http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/cpp/lin/index.htm Intel C/C++ Compiler]
 
*[http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/start/lin/cpp/index.htm Intel Getting started tutorial]
 
[http://www.pgroup.com/doc/pgiug.pdf PGI User's Guide (PDF)]
 
== Job Management (Queueing) System  ==
 
The queueing system employed to manage user jobs for FLOW and HERO is [http://wikis.sun.com/display/GridEngine/Home Sun Grid Engine] (SGE). For first-time users (especially those acquainted with PBS-based systems), some features of SGE may seem a little unusual and certainly need some getting-accustomed-to. In order to efficiently use the available hardware resources (so that all users may benefit the most from the system), a basic understanding of how SGE works is indispensable. Some of the points to keep in mind are the following:
 
*Unlike other (e.g., PBS-based) queueing systems, SGE does not "know" the concept of "nodes" with a fixed number of CPUs (cores) and users specifying the number of nodes they need, along with the number of CPUs per node, in their job requirements. Instead, SGE logically divides the cluster into '''slots''', where each "slot" may be thought of as a single CPU core. The scheduler assigns free slots to pending jobs. Since in the multi-core area each host offers many slots, this will, in general, lead to jobs of different users running concurrently on the same host (provided that there are sufficient resources like memory, disk space etc. to meet all requirements of all jobs, as specified by the users who submitted them) and usually guarantees efficient resource utilization.
 
*While the scheduling behavior described above may be very efficient in optimally using the available hardware resources, it will have undesirable effects on parallel (MPI, LINDA, ...) jobs. E.g., an MPI job requesting 24 slots could end up running 3 tasks on one host, 12 tasks on another host (fully occupying this host, if it is a server with 2 six-core CPUs, as happens with our clusters), and 9 tasks on a third host. Clearly, such an unbalanced configuration may lead to problems. For certain jobs (multithreaded applications) it is even mandatory that all slots reside on one host (typical examples: OpenMP programs, Gaussian single-node jobs).<br> To deal with the specific demands of parallel jobs, SGE offers so-called '''parallel environments (PEs)''' which are largely configurable. Even if your job does not need several hosts, but runs on only one host using several or all cores of that host, you '''must''' specify a parallel environment. '''It is of crucial importance to choose the "correct" parallel environment''' (meeting the requirements of your application/program) when submitting a parallel job.
 
*Another "peculiarity" of SGE (as compared to its cousins) are the concepts of '''cluster queues''' and '''queue instances'''. Cluster queues are composed of several (typically, many) queue instances, with each instance associated with one particular host. A cluster queue may have a name like, e.g., ''standardqueue.q'', where the .q suffix is a commonly followed convention. Then the queue instances of this queue has names like, e.g. ''standardqueue.q@host001'', ''standardqueue.q@host002'', ... (note the "@" which acts as a delimiter between the queue name and the queue instance).<br> In general, each host will hold several queue instances belonging to different cluster queues. E.g. there may be a special queue for long-running jobs and a queue for shorter jobs, both of which share the same "physical" machines but have different policies. To avoid oversubscription, resource limits can be configure for individual hosts. Since resource limits and other, more complex attributes can also be associated with cluster queues and even queue instances, the system is highly flexible and can be customized for specified needs. On the other hand, the configuration quickly tends to get rather complex, leading to unexpected side effects. E.g., PEs grab slots from all queue instances of all cluster queues they are associated with. Thus, a parallel job may occupy slots on one particular host belonging to different queue instances on that host. While this is usually no problem for the parallel job itself, it blocks resources in both cluster queues which may be unintended. For that reason, it is common practice to associate each PE with one and only one cluster queue and define several (possibly identically configured) PEs in order to avoid that a single PE spans several cluster queues.
 
==== Submitting jobs  ====
 
Sample job submission scripts for both serial and parallel jobs are provided in the subdirectory <tt>Examples</tt> of your homedirectory. You may have to adapt these scripts as needed. Note that a job submission script consists of two principal parts:
 
*SGE directives (lines starting with the "magic" characters <tt>#$</tt>), which fall into three categories:
**general options (which shell to use, name of the job, name of output and error files if differing from default, etc.). The directives are passed to the <tt>qsub</tt> command when the job is submitted.
**Resource requirements (introduced by the <tt>-l</tt> flag), like memory, disk space, runtime (wallclock) limit, etc.
**Options for parallel jobs (parallel environment, number of job slots, etc.)
 
*Commands to be executed by the job (your program, script, etc.), including the necessary set-up of the environment for the application/program to run correctly (loading of modules so that your programs find the required runtime libraries, etc.).
 
The job is submitted by the <tt>qsub</tt> command, e.g. (assuming your submission script is named"<tt>myprog.sge</tt>):
 
qsub myprog.sge
 
==== Specifying job requirements  ====
 
The general philosophy behind SGE is that you should not explicitly submit your job to a specific queue or queue instance (although this is possible in principle), but rather define your requirements, and then let SGE decide which queue matches these requirements and where your job best runs in (taking into account the current load of the system and other factors). For this "automatic" queue selection to work efficiently, it is important that you specify your job requirements carefully. The following points are relevant to both serial and parallel jobs:
 
*Maximum (wallclock) runtime is specified by <tt>-l h_rt=&lt;hh:mm:ss&gt;</tt>. E.g., a maximum runtime of three days is defined by <pre>$# -l h_rt=72:0:0</pre> All cluster queues except the "long" queues have a maximum allowed runtime of 8 days. It is highly recommendable that you specify the runtime of your job as closely as possible and reasonable (leaving a margin of error, of course!). If the scheduler knows that, e.g., your pending job is a fast run (requiring, e.g., only a few hours) it is likely that it gets executed much earlier (the so-called '''backfilling''' mechanism).
 
*If your job needs more than 8 days of runtime, your submission script must contain a line like:<pre>$# -l longrun=true</pre>It is then automatically transferred to one of the "long" queues, which have no runtime limit. The number of long-running jobs per user is limited.
 
*Maximum memory usage is defined by the <tt>h_vmem</tt> attribute, as in<pre>$# -l h_vmem=4G</pre>for a job requesting 4 GB of main memory. '''Note''': the above attribute refers to the memory '''per job slot''' (CPU core), i.e. it gets multiplied by the number of slots the parallel job requested (see below).
 
*The standard compute nodes of HERO (<tt>mpcs001..mpcs130</tt>) offer 23 GB memory in total, whereas the "low-memory" nodes of FLOW (<tt>cfdl001..cfdl122</tt>) have a limit of 22 GB (these nodes are diskless, therefore the operating systems also resides in the RAM). If your job needs one of the "big" nodes of HERO (<tt>mpcb001..mpcb020</tt>) offering 46 GB of RAM you need to specify your memory requirements '''and''' set the Boolean attribute <tt>bignode</tt> to <tt>true</tt>. Assuming, e.g., a parallel job in the SMP PE (see below) requesting 12 slots, which always runs on one host, this may look like:
 
#$ -l h_vmem=3G
$# -l bignode=true
 
Similarly, to request one of the "high-memory" nodes of FLOW (<tt>cfdh001..cfdh064</tt>) offering 46 GB of RAM, you must specify (assuming, e.g., an MPI job running 12 tasks per node):
 
#$ -l h_vmem=3G
$# -l highmem=true
 
in your submission script.
 
*Specifying required local disk space of your job (HERO cluster only):<pre>#$ -l h_fsize=200G</pre>for requesting 200 GB of scratch space. The standard nodes offer a maximum of 900 GB of local disk space for all jobs running on them. If your job needs more than 900 GB scratch space, you must request one of the big nodes (offering 2100 GB of disk space) as in, e.g.:
 
#$ -l h_fsize=1400G
$# -l bignode=true
 
Note that several of the above options may have to be combined. For example, for a long job generating huge scratch files you have specify both <tt>-l longrun=true</tt> and <tt>-l bignode=true</tt>.
 
The path to the local scratch directory can be accessed in your job submission script via the <tt>$TMPDIR</tt> environment variable.
 
==== Parallel environments (PEs)  ====
 
'''Example''': If you have an MPI program compiled and linked with the Intel Compiler and MPI library,
your job submission script would contain the following lines:
#$ -pe intelmpi 96 
#$ -R y
 
Turning on resource reservation (<tt>-R y</tt>) is highly recommended in order to avoid starving of parallel jobs by serial jobs which "block" required slots on specific hosts.
You would be requesting 96 cores. The allocation rule employed is "fill-up", i.e. SGE tries to place the MPI tasks on as few hosts as possible (in the "ideal" case, the program would run on exactly 8 hosts (with cores or slots on each host, but there is no guerantee that this is going to happen).
List of all currently available PEs:
* mpich
 
*<tt>mpich2_mpd</tt>
 
*<tt>mpich2_mpd</tt>, see above.
 
*<tt>intelmpi</tt> for using the Intel MPI Library, see above.
 
*<tt>smp</tt>: this PE request
 
* If you s
 
Note that this list will grow,
 
... tbc ...
 
<br>
 
==== Interactive jobs  ====
 
==== Monitoring and managing your jobs  ====
 
qstat
 
qstat -j &lt;jobid&gt; to get a more verbose output, which is particularly useful when analyzing why your job won't run.
 
qdel
 
qalter
 
qhost
 
==== Array jobs ====
 
... are a very efficient way of managing your jobs under certain circumstances (e.g., if you have to run one identical program many times on different data sets, with different initial conditions, etc.). Please refer to the documentation provided below.
 
=== Documentation  ===
 
== Using the Altix UV 100 system  ==
 
The SGI system is used for very specific applications (in need of a large and highly performant shared-memory system) and can presently only be accessed by the Theoretical Chemistry group. Entitled users may login to the system via <tt>ssh</tt> (the same rules as for the login nodes of the main system apply, i.e. access is only possible from within the intranet of the University, otherwise you have to establish a VPN tunnel):
abcd1234@uv100.hpc.uni-oldenburg.de
 
The Altix UV system has a RHEL 6.0 operating system installed.
As for the IBM cluster, the modules environment is used.
 
=== Compiling and linking applications ===
 
It is strongly recommended to use MPT (Message Passing Toolkit), SGI's own implementation of the MPI standard. It is only then that the highly specialized HW architecture of the system can be fully exploited.
The MPT module must be loaded both for compiling and (in general) at runtime in order for you application to find the dynamically linked libraries:
 
module load mpt
 
Note that MPT is not a compiler. SGI does not provide own compilers for x86-64 based systems. One may, e.g., use the Intel compiler:
 
module load intel/ics
 
Basically, you can use the compiler the same way you are accustomed to, and link against the MPT library by setting the flag <tt>-lmpi</tt>. See the documentation provided below, and also for how to run MPI programs.
 
=== Documentation ===
 
* [http://techpubs.sgi.com/library/manuals/3000/007-3773-003/sgi_html/index.html MPT User's Guide]
 
*[http://docs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=bks&srch=&fname=/SGI_Developer/LX_86_AppTune/sgi_html/front.html Application Tuning Guide for SGI X86-64 Based Systems]
 
(Both of the above are also available as PDF download.)
 
 
= Application Software and Libraries  =
 
== Computational Chemistry  ==
 
=== Gaussian 09  ===
 
== Single-node (multi-threaded) Gaussian jobs ==
 
You have to use the SMP parallel environment (sse above) to ensure that all slots are on the same host,
as in the following example:
 
Your input file <tt>myinputfile</tt> (link0 section) must then contain the line:
%NProcShared=12
 
Of course, you also have to reserve enough memory and disk space for your job.
 
In the above example, you are reserving all 12 slots that a single host can offer.
If you requested less than 12 slots, other users may have jobs running on that host, too.
 
== Linda jobs ==
 
Use the <tt>linda</tt> parallel environment (cf. above). The number of requested slots '''must''' be an integer multiple of 12 (= the maximum number of slots per host). E.g., for a Linda job requesting four nodes (Linda workers), the relevant section of the submission script would be:
#
#$ -pe linda 48
#$ -R y
#
# module load gaussian
# g09run myinputfile
#
 
It is mandatory that the input file contains the line
%LindaWorkers=
Otherwise, your job will not be started as Linda job. The wrapper script passes the input file and looks for precisely that keyword (anything after the "=" can be ignored, the wrapper replaces this line and fills in the correct node list for the running job). Note that the <tt>%NProcl</tt> directive of older Gaussian versions is deprecated and should no longer be used.
 
=== MOLCAS  ===
 
not yet installed
 
... tbc ...
 
=== MOLPRO  ===
 
not yet installed
 
... tbc ...
 
 
== Matlab  ==
 
 
 
 
== LEDA  ==
 
To set the correct paths and environment variables use the corresponding module:
 
 
There is also a multi-threaded version available:
 
 
= Advanced Usage  =
 
Here will you will find, among others, hints how to analyse and optimize your programs using HPC tools (profiler, debugger, performance libraries), and other useful information.
 
... tbc ...

Latest revision as of 15:08, 6 June 2017


Picture of nodes.jpg Picture of cluster closed.jpg This is the HPC-Wiki of the University of Oldenburg
Picture of gpfs.jpg Picture of infinyband.jpg

Basic Information

HPC Facilities Login User environment Compiling and linking Job Management (Queueing) System Altix UV 100 system Examples

Application Software and Libraries

Compiler and Development Tools Quantum Chemistry Computational Fluid Dynamics Mathematics/Scripting Visualisation Libraries

Courses and Tutorials

Introduction to HPC Courses Matlab Tutorials New OS


Contact

HPC Resource EMail

FLOW and HERO
Both (in case of vacation)

Stefan.Harfst@uni-oldenburg.de
hpcuniol@uni-oldenburg.de


Note: This Wiki is under construction and a preliminary version! Contributions are welcome. Please ask Stefan Harfst (Stefan.Harfst(at)uni-oldenburg.de) for further informations.

Only for editors: Formatting rules for this Wiki

HPC User Wiki 2016