Best Practice

From HPC users
Revision as of 15:48, 15 October 2021 by Schwietzer (talk | contribs) (→‎Running Many Jobs in Parallel)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Introduction

HPC clusters are expensive. This should be kept in mind when running jobs on our clusters. The hardware of compute node costs around 3,000 to 4,000€ (and more for special nodes, e.g. with high memory, NVMe adapters or GPUs). With the additional hardware required for a HPC cluster (storage, network, ...) the bare-metal costs of a core-hour are around 0.01€. At least half of that again has to be added for electricity, and additional costs can be added for many things (server room, software, staff at Scientific Computing and IT Services, maintenance, ...). In total, this adds up a few cents/core-hour or ~1€/node-hour.

This means you should be responsible in using the HPC cluster and make sure that you only run large jobs if you are as certain as possible that the job will produce valuable output. Of course this requires testing beforehand, and testing (with small test jobs) is highly encouraged. Also, since we have a tier 3 HPC system, you do not have to be an expert in HPC and mistakes are allowed if you learn from them. With the experience you gain here you can apply to the larger tier 2- or tier 1-systems where your access to core-hours is strictly limited (and you want to avoid wasting hours).

The following Best Practice advise should give you some ideas on how to best utilize the cluster.

Running Many Jobs in Parallel

Running Jobs with High I/O

will follow