HPC Facilities of the University of Oldenburg 2016

From HPC users
Jump to navigationJump to search

Overview

In 2016, two new HPC clusters were installed at the University Oldenburg in order to replace the previous systems HERO and FLOW. The new HPC clusters are:

  • CARL (named after Carl von Ossietzky, of course, but if you like acronyms then try "Carl's Advanced Research Lab") serving as multi-purpose compute cluster
- Lenovo NeXtScale System
- 327 compute nodes
- 7.640 CPU cores
- 77 TB of main memory (RAM)
- 360TB local storage (HDD and SSD flash adapters)
- 271 TFlop/s theoretical peak performance
  • EDDY (named after the swirling of a fluid) used for research in windenergy
- Lenovo NeXtScale System
- 244 compute nodes
- 5.856 CPU cores
- 21 TB of main memory (RAM)
- 201 TFlop/s theoretical peak performance
  • GOLD (GPU-Cluster Oldenburg)
- GPU Cluster of work group machine learning
- for further information see below


Both clusters, CARL and EDDY, shared some common infrastructure, namely

  • FDR Infiniband interconnect for parallel computations and parallel I/O
- CARL uses a 8:1 blocking network topology
- EDDY uses a fully non-blocking network topology
  • Ethernet Network for Management and IPMI
  • a GPFS parallel file system with about 900TB net capacity and 17/12 GB/s parallel read/write performance
  • additional storage is provided by the central NAS-system of IT services

The systems are housed and maintained by IT services and the adminstration of the clusters is done using the Bright Cluster Manager 7.3.

Detailed Hardware Overview

CARL

  • 158 "Standard" compute nodes (MPC-STD)
- 2x Intel Xeon CPU E5-2650 v4 12C with 2.2GHz
- 256 GB RAM consisting of 8x 32GB TruDDR4 modules @ 2400MHz
- Intel C612 Chipset
- 1 TB 7.2K 6Gbps HDD (used for local storage)
- Connect-IB Single-port Card
  • 128 "low-memory" compute nodes (MPC-LOM)
These nodes are identical to the "Standard" compute nodes (MPC-STD).
The only difference is that they have 128 GB of RAM instead of 256 GB.
  • 30 "High-memory" computes nodes (MPC-BIG)
- 2x Intel Xeon CPU E5-2667 v4 8C with 3.2GHz
- 512 GB RAM consisting of 16x 32GB TruDDR4 modules @ 2400MHz
- Intel C612 Chipset
- Intel P3700 2.0TB NVMe Flash Adapter
- Connect-IB Single-port Card
- four nodes (mpcb[001-004]) include two NVIDIA GTX 1080 GPUs each
  • 2 "Pre- and postprocessing" compute nodes (MPC-PP)
- 4x Intel Xeon CPU E7-8891 v4 10c with 2.8GHz
- 2048 GB RAM consisting of 64x 32GB TruDDR4 modules @ 2400MHz
- Intel C612 Chipset
- Intel P3700 2.0TB NVMe Flash Adapter
- Connect-IB Single-port Card
  • 9 "GPU" compute nodes (MPC-GPU)
- these nodes are identical to the "Standard" compute nodes (MPC-STD)
- The only difference is the added GPU (NVIDIA Tesla P100 16GB PCI-Edition).
- mpcg[001-003] have two cards each, the other nodes one card each

EDDY

  • 160 "Low-memory" compute nodes
- 2x Intel Xeon CPU E5-2650 v4 12C with 2.2GHz
- 64 GB RAM consisting of 8x 8GB TruDDR4 modules @ 2400MHz
- Intel C612 Chipset
- Connect-IB Single-port card
  • 81 "High-memory" compute nodes
- Intel Xeon CPU E5-2650 v4 12C with 2.2GHz
- 128 GB RAM consisting of 8x 16GB TruDDR4 modules @ 2400MHz
- Intel C612 Chipset
- Connect-IB Single-port card
  • 3 "GPU" compute nodes
- 2x Intel Xeon CPU E5-2650 v4 12C with 2.2GHz
- 256 GB RAM consisting of 8x 32GB TruDDR4 modules @ 2400MHz
- Intel C612 Chipset
- 1TB 7.2k 6Gbps HDD (used for local storage)
- Connect-IB Single-port card
- The only difference is the added GPU (NVIDIA Tesla P100 16GB PCI-Edition).

GOLD

-14x NVIDIA Geforce Titan Black
- 11x NVIDIA Geforce Titan X
- 2x NVIDIA Geforce Titan X Pascal
- 3x NVIDIA Tesla P100 16GB
- 144 CPU cores (Intel Xeon)
- 11 compute nodes, each 128-256 GB DDR3/DDR4, Infiband connected

GOLD is the own cluster of the work group agml (machine learning, Prof. Dr. Lücke) and therefore is infrastructure-wise completely detached from CARL and EDDY. This means that the the modules installed there are maintained and installed separately and that our software list is not applicableon this specific cluster. However, modules are still maintained there with the lmod system and jobs are also managed there with slurm. For this Wiki this means that only rudimentary information such as the use of SLURM and the description and examples of use of individual programs may also apply to GOLD. Further information, however, can vary strongly depending on the cluster and might be not applicable on GOLD.

For further information, you can visit the GOLD website. Should you have technical questions regarding GOLD, please contact Mr. Marco-Marcel Pechtold