Difference between revisions of "SLURM Job Management (Queueing) System"

From HPC users
Jump to navigationJump to search
Line 1: Line 1:
The new system that will manage the user jobs on CARL and EDDY will be [https://slurm.schedmd.com/ SLURM] (formally known as '''S'''imple '''L'''inux '''U'''tility for '''R'''esource '''M'''anagement). SLURM is a free and open-source job scheduler for Linux and Unix-like kernels and is used on about 60% of the world's supercomputers and computer clusters. If you used the job scheduler (Sun Grid Engine or SGE) of FLOW and HERO, it will be easy to get used to SLURM because the concept of SLURM is quite similar.
The new system that will manage the user jobs on CARL and EDDY will be [https://slurm.schedmd.com/ SLURM] (formally known as '''S'''imple '''L'''inux '''U'''tility for '''R'''esource '''M'''anagement). SLURM is a free and open-source job scheduler for Linux and Unix-like kernels and is used on about 60% of the world's supercomputers and computer clusters. If you used the job scheduler (Sun Grid Engine or SGE) of FLOW and HERO, it will be easy to get used to SLURM because the concept of SLURM is quite similar.
SLURM provides three key functions:
* it allocates exclusive and/or non exclusive acces to resources (computer nodes) to users for some duration of time so they can perform work
* it provides a framework for starting, executing and monitoring work (typically a parallel job on a set of allocated nodes
* it arbitrates contetion of resources by managing a queue of pending work

Revision as of 14:11, 8 December 2016

The new system that will manage the user jobs on CARL and EDDY will be SLURM (formally known as Simple Linux Utility for Resource Management). SLURM is a free and open-source job scheduler for Linux and Unix-like kernels and is used on about 60% of the world's supercomputers and computer clusters. If you used the job scheduler (Sun Grid Engine or SGE) of FLOW and HERO, it will be easy to get used to SLURM because the concept of SLURM is quite similar.

SLURM provides three key functions:

  • it allocates exclusive and/or non exclusive acces to resources (computer nodes) to users for some duration of time so they can perform work
  • it provides a framework for starting, executing and monitoring work (typically a parallel job on a set of allocated nodes
  • it arbitrates contetion of resources by managing a queue of pending work