How to Share Fair

From HPC users
Jump to navigationJump to search

Introduction

The HPC cluster is a shared resource used by many researchers (more than 150/year). Unfortunately, it is also a limited resource, so it is important that the sharing is done in fair way. Fair sharing is one of the tasks of job scheduler Slurm, however, you can also do your part as a user of the cluster.

The Scheduler

Slurm has a rather complex fair-share mechanism built-in, which takes into account many factors. On CARL and EDDY, we have enabled two main factors: fair-share and wait time. The first one, the fair-share, is a factor based on the number of jobs a user has been running in the past couple of weeks. The second one is based on the wait time in the queue. If two users have used the same amount of computing time, then the job that was submitted earlier will also start to run earlier. But if one user has not used the cluster for a while, his or her jobs can start before jobs that were submitted earlier. The scheduler also considers the job size and makes sure that resources are kept free until a large job can start. And last but not least, the scheduler uses back-filling which means that small and short-running jobs are started earlier to fill the gaps in between larger and long-running jobs.

Unfortunately, the scheduler has problems, if the jobs are very different in their resource requirements (single-core vs. many-core, low vs. high-memory) and if jobs are allowed to run for very long times. This is, of course, exactly the situation on CARL. This is also the reason why at other supercomputing centers (e.g. HLRN) the scheduling policies are much more restrictive (allocation only for full nodes, runtimes limited to 12-24h). However, there are good reasons to have more flexible scheduling on CARL.

Fair Job Submission

As a user, you can help to improve the fair sharing of the limited HPC resources. Here are some guidelines that might help:

  • Only submit jobs that you need to run: The use of the HPC cluster is essentially free of charge for the users, but one should not forget that the provision of resources costs money. Accordingly, one should use the available resources sparingly and only start those computations that are useful for one's own research. Of course, this also includes smaller test calculations. If, for example, I vary a value a hundred times in a parameter study, then I should consider beforehand whether 50 variations might not be enough to achieve the same result.
  • Only request the resources you really need: Typically, you will request one or more cores (using the options --ntasks and/or --cpus-per-task and you should make sure, that your job is then using all the requested cores. Jobs containing different steps of a workflow should be separated if the different steps use different numbers of cores. You can use job dependencies for this. In addition to cores, you may also request memory with --mem (per node) or --mem-per-cpu (per core). If you are using more than the default memory in a given partition (e.g. 5000M/core in carl.p) then you should check if you really used the requested memory.
  • Make sure your job runs at the optimal performance: Whether an application runs with optimal performance on the cluster is often difficult to judge, also because a number of factors play a role. First of all, the application should have been built with the latest possible compiler and all compiler optimizations. Wherever possible, the available numerical libraries such as LAPACK, FFTW, or MKL should be used. The applications provided in the modules fulfill these requirements. For parallel applications, benchmarks should be used to find out how well the application scales on the system and with what number of cores efficient parallel computation is still possible. For I/O, $WORK should preferably be used.