Difference between revisions of "Quickstart Guide"
Line 70: | Line 70: | ||
*'''Path:''' /scratch | *'''Path:''' /scratch | ||
*'''Environment varibale:''' $TMPDIR | *'''Environment varibale:''' $TMPDIR | ||
:'''Remember''': "/scratch" (or $TMPDIR) is only available if you demanded it in your jobscript. Further informations can be found [File system and Data Management#Scratch space / TempDir here] | |||
Revision as of 15:24, 27 February 2017
This is a quick start guide to help you start to work on the HPC-clusters CARL and EDDY.
If you have questions that arent answered in this guide, please contact the Scientific Computing
HPC Cluster Overview
The HPC cluster, located at the Carl von Ossietzsky Universität Oldenburg, consists of two clusters named CARL and EDDY. They are connected via FDR Infiniband for parallel computations and parallel I/O. CARL uses a 8:1 blocking network topology and EDDY uses a fully non-blocking network topology. Further, they are connected via an ethernet network for management and IPMI. They also share an GPFS parallel file system with about 900TB net capacity and 17/12 GB/s paralell read/write performance. Additional storage is provided by the central NAS-system of the IT-services.
Both clusters are based on the Lenovo NeXtScale system.
CARL (271 TFlop/s theoretical peak performance):
- 327 compute nodes (9 of these with a GPU)
- 7.640 CPU cores
- 77 TB of RAM
- 360TB local storage
EDDY (201 TFlop/s theoretical peak performance):
- 244 compute nodes (3 of these with a GPU)
- 5.856 CPU cores
- 21 TB of RAM
For more detailed informations about the cluster, you can visit our Overview.
Login
If you want to access the HPC-cluster, you need to have an authorized university account. If you are not authorized yet, request an account.
You can use a SSH client of your choice or the command line on linux computers to connect to the cluster via ssh. To do so, use either
carl.hpc.uni-oldenburg.de
or
eddy.hpc.uni.oldenburg.de
For further informations about the login, please look at the guide located on the page Login to the HPC cluster.
File System
The cluster offers two files systems: The GPFS Storage Server (GSS) and the central storage system of the IT services.
GPFS Storage Server (GSS):
- parallel file system
- total (net) capacity is about 900TB
- R/W performance is up to 17/12 GB/s over FDR Infiniband
- can be mounted using SMB/NFS
- used as the primary storage for HPC (for data that is read/written by compute nodes)
- no backup!
Central storage system of the IT services (Isilon Storage System):
- NFS-mounted $HOME-directories
- high availability
- snapshots
- back
- used as permanent storage!
HOME:
- Path: /user/abcd1234
- Environment variable: $HOME
DATA:
- Path: /gss/data/abcd1234
- Environment variable: $DATA
WORK:
- Path: /gss/work/abcd1234
- Environment variable: $WORK
Scratch:
- Path: /scratch
- Environment varibale: $TMPDIR
- Remember: "/scratch" (or $TMPDIR) is only available if you demanded it in your jobscript. Further informations can be found [File system and Data Management#Scratch space / TempDir here]
If you look at your home directory, you will see two links: old_home_abcd1234 -> /bright/user/../abcd1234 and old_work_abcd1234 -> /bright/data/work/../abcd1234.
Further information can be found on the related page in the wiki: File system and Data Management
Software and Environment
There are many pre-installed software packages like compilers, libraries, pre- and postprocessing tools and further applications provided. We are using the command module to manage them.
With this command you can:
- list the available software
- access/load software (even in different versions)
Example: Show the software on CARL and EDDY and load the Intel compiler
[abcd1234@hpcl001 ~]$ module avail -----------/cm/shared/uniol/modules/compiler----------- ... icc/2016.3.210 [abcd1234@hpcl001 ~]$ module load icc/2016.3.210 [abcd1234@hpcl001 ~]$ module list Currenty loaded modules: ... icc/2016.3.210 ...
Basic Job Submission
The new workload manager and job management queueing system on CARL and EDDY is called SLURM. SLURM is a free and open-source job scheduler for Linux and Unix-like kernels and is used on about 60% of the world's supercomputers and computer clusters.
To submit a job on the HPC cluster you need two things:
- the command sbatch
- a jobscript
If you have your jobscript (an example is linked at the end) you can simply queue it with the command:
sbatch -p carl.p my_first_job.job
The option "-p" defines the used partition. Please keep in mind that choosing the right partition will allow your job to be run faster. If you choose the wrong one it might take a while for your job to start. Therefore we recommend you to look at wiki-article about partitions.
If you did submit your job sucessfully, you can check its status with
squeue -u abcd1234
As always: "abcd1234" is just a placeholder for your own username! Informations like JOBID, PARTITION, JOBNAME, USER, TIME and the amound of NODES will be displayed. (not adding the options -u will show every job that is currently running on the cluster)
Further information about the job submission and an example jobscript can be found on the related page in the wiki: SLURM Job Management (Queueing) System