SLURM Job Management (Queueing) System

From HPC users
Revision as of 14:11, 8 December 2016 by Brunken (talk | contribs)
Jump to navigationJump to search

The new system that will manage the user jobs on CARL and EDDY will be SLURM (formally known as Simple Linux Utility for Resource Management). SLURM is a free and open-source job scheduler for Linux and Unix-like kernels and is used on about 60% of the world's supercomputers and computer clusters. If you used the job scheduler (Sun Grid Engine or SGE) of FLOW and HERO, it will be easy to get used to SLURM because the concept of SLURM is quite similar.

SLURM provides three key functions:

  • it allocates exclusive and/or non exclusive acces to resources (computer nodes) to users for some duration of time so they can perform work
  • it provides a framework for starting, executing and monitoring work (typically a parallel job on a set of allocated nodes
  • it arbitrates contetion of resources by managing a queue of pending work