Difference between revisions of "Best Practice"

From HPC users
Jump to navigationJump to search
(added)
 
(completed first version, need to add a lot still)
Line 3: Line 3:
HPC clusters are expensive. This should be kept in mind when running jobs on our clusters. The hardware of compute node costs around 3,000 to 4,000€ (and more for special nodes, e.g. with high memory, NVMe adapters or GPUs). With the additional hardware required for a HPC cluster (storage, network, ...) the ''bare-metal'' costs of a core-hour are around 0.01€. At least half of that again has to be added for electricity, and additional costs can be added for many things (server room, software, staff at Scientific Computing and IT Services, maintenance, ...). In total, this adds up a few cents/core-hour or ~1€/node-hour.
HPC clusters are expensive. This should be kept in mind when running jobs on our clusters. The hardware of compute node costs around 3,000 to 4,000€ (and more for special nodes, e.g. with high memory, NVMe adapters or GPUs). With the additional hardware required for a HPC cluster (storage, network, ...) the ''bare-metal'' costs of a core-hour are around 0.01€. At least half of that again has to be added for electricity, and additional costs can be added for many things (server room, software, staff at Scientific Computing and IT Services, maintenance, ...). In total, this adds up a few cents/core-hour or ~1€/node-hour.


This means you should be responsible in using the HPC cluster and make sure that you only run large jobs if you are as certain as possible that the job will produce valuable output. Of course this requires testing beforehand, and testing (with small test jobs) is highly encouraged. Also, since we have a ''tier 3'' HPC system, you do not have to be an expert in HPC and mistakes are allowed if you learn from them. With the experience you gain here you can apply to the larger ''tier 2''- or ''tier 1''-systems where your access
This means you should be responsible in using the HPC cluster and make sure that you only run large jobs if you are as certain as possible that the job will produce valuable output. Of course this requires testing beforehand, and testing (with small test jobs) is highly encouraged. Also, since we have a ''tier 3'' HPC system, you do not have to be an expert in HPC and mistakes are allowed if you learn from them. With the experience you gain here you can apply to the larger ''tier 2''- or ''tier 1''-systems where your access to core-hours is strictly limited (and you want to avoid wasting hours).
 
The following Best Practice advise should give you some ideas on how to best utilize the cluster.
 
== Running Many Jobs in Parallel ==
 
will follow
 
== Running Jobs with High I/O ==
 
will follow

Revision as of 15:56, 12 February 2018

Introduction

HPC clusters are expensive. This should be kept in mind when running jobs on our clusters. The hardware of compute node costs around 3,000 to 4,000€ (and more for special nodes, e.g. with high memory, NVMe adapters or GPUs). With the additional hardware required for a HPC cluster (storage, network, ...) the bare-metal costs of a core-hour are around 0.01€. At least half of that again has to be added for electricity, and additional costs can be added for many things (server room, software, staff at Scientific Computing and IT Services, maintenance, ...). In total, this adds up a few cents/core-hour or ~1€/node-hour.

This means you should be responsible in using the HPC cluster and make sure that you only run large jobs if you are as certain as possible that the job will produce valuable output. Of course this requires testing beforehand, and testing (with small test jobs) is highly encouraged. Also, since we have a tier 3 HPC system, you do not have to be an expert in HPC and mistakes are allowed if you learn from them. With the experience you gain here you can apply to the larger tier 2- or tier 1-systems where your access to core-hours is strictly limited (and you want to avoid wasting hours).

The following Best Practice advise should give you some ideas on how to best utilize the cluster.

Running Many Jobs in Parallel

will follow

Running Jobs with High I/O

will follow