Difference between revisions of "Best Practice"
(completed first version, need to add a lot still) |
Schwietzer (talk | contribs) |
||
Line 9: | Line 9: | ||
== Running Many Jobs in Parallel == | == Running Many Jobs in Parallel == | ||
[[How to Manage Many Jobs]] | |||
== Running Jobs with High I/O == | == Running Jobs with High I/O == | ||
will follow | will follow |
Revision as of 14:47, 15 October 2021
Introduction
HPC clusters are expensive. This should be kept in mind when running jobs on our clusters. The hardware of compute node costs around 3,000 to 4,000€ (and more for special nodes, e.g. with high memory, NVMe adapters or GPUs). With the additional hardware required for a HPC cluster (storage, network, ...) the bare-metal costs of a core-hour are around 0.01€. At least half of that again has to be added for electricity, and additional costs can be added for many things (server room, software, staff at Scientific Computing and IT Services, maintenance, ...). In total, this adds up a few cents/core-hour or ~1€/node-hour.
This means you should be responsible in using the HPC cluster and make sure that you only run large jobs if you are as certain as possible that the job will produce valuable output. Of course this requires testing beforehand, and testing (with small test jobs) is highly encouraged. Also, since we have a tier 3 HPC system, you do not have to be an expert in HPC and mistakes are allowed if you learn from them. With the experience you gain here you can apply to the larger tier 2- or tier 1-systems where your access to core-hours is strictly limited (and you want to avoid wasting hours).
The following Best Practice advise should give you some ideas on how to best utilize the cluster.
Running Many Jobs in Parallel
Running Jobs with High I/O
will follow