Difference between revisions of "HPC Facilities of the University of Oldenburg"

From HPC users
Jump to navigationJump to search
Line 52: Line 52:
The 1 Gb/s leaf switches have uplinks to a 10 Gb/s backbone (two switches, redundant). The central management interface of both clusters runs on two master nodes (IBM x3550 M3) in an HA setup. Each cluster has two login nodes (IBM x3550 M3).  
The 1 Gb/s leaf switches have uplinks to a 10 Gb/s backbone (two switches, redundant). The central management interface of both clusters runs on two master nodes (IBM x3550 M3) in an HA setup. Each cluster has two login nodes (IBM x3550 M3).  


Operating system: '''Scientific Linux 5.5'''  
Operating system: '''ReHat 6.5'''  


Cluster management software: '''Bright Cluster Manager 5.1''' by [http://www.clustervision.com ClusterVision B.V.]
Cluster management software: '''Bright Cluster Manager 5.1''' by [http://www.clustervision.com ClusterVision B.V.]

Revision as of 17:33, 22 January 2015

Presently, the central HPC facilities of the University of Oldenburg comprise three systems:

  • FLOW (Facility for Large-Scale COmputations in Wind Energy Research)
    IBM iDataPlex cluster solution, 2232 CPU cores, 6 TB of (distributed) main memory, QDR InfiniBand interconnect.
    Theoretical peak performance: 24 TFlop/s.
    LINPACK performance: 19.0 TFlop/s.
  • HERO (High-End Computing Resource Oldenburg)
    Hybrid system composed of two components:
    • IBM iDataPlex cluster solution, 1800 CPU cores, 4 TB of (distributed) main memory, Gigabit Ethernet interconnect.
      Theoretical peak performance: 19.2 TFlop/s. .
      LINPACK performance: 8.7 TFlop/s.
    • SGI Altix UltraViolet shared-memory system ("SMP component"), 120 CPU cores, 640 GB of globally addressable memory, NumaLink5 interconnect
      Theoretical peak performance: 1.3 TFlop/s.
  • GOLEM: older, AMD Opteron-based cluster with 390 cores and 800 GB of (distributed) main memory.
    Theoretical peak performance: 1.6 TFlop/s.

FLOW and HERO use a common, shared storage system (high-performance NAS Cluster) with a net capacity of 160 TB.

FLOW is used for computationally demanding CFD calculations in wind energy research, conducted by the Research Group TWiST (Turbulence, Wind Energy, and Stochastis) and the ForWind Center for Wind Energy Research. It is, to the best of our knowledge, the largest system in Europe dedicated solely to that purpose.

The main application areas of the HERO cluster are Quantum Chemistry, Theoretical Physics, the Neurosciences, and Audiology. Besides that, the system is used by many other research groups of the Faculty of Mathematics and Science and the Department of Informatics of the School of Computing Science, Business Administration, Economics, and Law.

Hardware Overview

FLOW

  • 122 "low-memory" compute nodes: IBM dx360 M3, dual socket (Westmere-EP, 6C, 2.66 GHz), 12 cores per server, 24 GB DDR3 RAM, diskless (host names cfdl001..cfdl122).
  • 64 "high-memory" compute nodes: IBM dx360 M3, dual socket (Westmere-EP, 6C, 2.66 GHz), 12 cores per server, 48 GB DDR3 RAM, diskless (host names cfdh001..cfdh064).
  • 7 compute nodes: IBM, dual socket (Nehalem-EP, 4C, 2.26 GHz), 8 cores per server, 32 GB DDR3 RAM, 150Gb scratch disk, (host names cfdx001..cfdx007).
  • 10 compute nodes: IBM, dual socket (IvyBridge EP, 8C, 2.6 GHz), 16 cores per server, 64 GB DDR3 RAM, (host names cfdi*).
  • QDR InfiniBand interconnect (fully non-blocking), 198-port Mellanox IS5200 IB switch (can be extended up to 216 ports).
  • Gigabit Ethernet for File-I/O etc.
  • 10/100 Mb/s Ethernet for management and administrative tasks (IPMI).
  • high-performance IBM GPFS storage system with 130TB and connected by InfiniBand

HERO

  • 130 "standard" compute nodes: IBM dx360 M3, dual socket (Westmere-EP, 6C, 2.66 GHz), 12 cores per server, 24 GB DDR3 RAM, 1 TB SATAII disk (host names mpcs001..mpcs130).
  • 20 "big" compute nodes: IBM dx360 M3, dual socket (Westmere-EP, 6C, 2.66 GHz), 12 cores per server, 48 GB DDR3 RAM, RAID 8 x 300 GB 15k SAS (host names mpcb001..mpcb020)
  • Gigabit Ethernet II for communication of parallel jobs (MPI, LINDA, ...).
  • Second, independent Gigabit Ethernet for File-I/O etc.
  • 10/100 Mb/s Ethernet for management and administrative tasks (IPMI).
  • SGI Altix UV 100 shared-memory system, 10 CPUs (Nehalem-EX, "Beckton", 6C, 2.66 GHz), 120 cores in total, 640 GB DDR3 RAM, NumaLink5 interconnect, RAID 20 x 600 GB SAS 15k rpm (host uv100).

The 1 Gb/s leaf switches have uplinks to a 10 Gb/s backbone (two switches, redundant). The central management interface of both clusters runs on two master nodes (IBM x3550 M3) in an HA setup. Each cluster has two login nodes (IBM x3550 M3).

Operating system: ReHat 6.5

Cluster management software: Bright Cluster Manager 5.1 by ClusterVision B.V.