Memory Overestimation

From HPC users
Revision as of 15:35, 1 August 2013 by Melchert (talk | contribs)
Jump to navigationJump to search

One of the most important consumable resources that needs to be allocated for a job is its memory. As pointed out in the [SGE_Job_Management_(Queueing)_System#Memory|overview], the memory, meaning physical plus virtual memory, is addressed by the h_vmem option. The h_vmem attribute refers to the memory per job slot, i.e. it gets multiplied by the number of slots when submitting a parallel job. Different types of nodes offer different amounts of total memory for jobs to run. E.g., a standard node on HERO (overall number: 130) offers 23GB and a big node on HERO (overall number: 20) offers 46GB.

The operation efficiency of the HPC system, in particuar the workload of the HERO component (following a fill-up allocation rule for the jobs), is directly linked to the allocation of memory for the jobs to be run. For the Cluster to operate as efficient as possible, a proper memory allocation by the users, i.e. as few memory overestimation per job as possible, is inevitable.

Consider the following extreme example: a user specifies the amount of h_vmem=1.7G for a 12-slot job. I.e., the job allocates an overall amount of 20.4G. However, upon execution the job only used an overall amount of approximately 2G. The parallel environment for this actual example was smp, so the job ran on a single execution host and the parallel environment memory issue is no issue here. Albeit this is a rather severe example of memory overestimation (about 18G overall, or, 1.5 per used slot) this job does not block other jobs from running on that particular execution host, since it uses the full amount of available slots (i.e. 12 slots).

However, note that there are also other examples that do have an impact on ohter users: a user specifies the amount of h_vmem=6G for a single-slot job, which turned out to have a peak memory usage of 36M, only. Due due memory restrictions, four such jobs cannot run on a single host. However, by means of three such jobs one can block lots of resources, leaving 5G for the remaining 9 slots. Given the fact that, here, a typical job uses 2-3G, this allows for only two further jobs. In this case, the memory dissipation on that host amounts to 17G (in the most optimistic case).

Also, note that the above examples have to be taken with a grain of salt. In general there is a difference betwenn the peak memory usage of a job and its current memory usage. It might very well be that a job proceeds in two parts, (i) a part where it needs only few memory, and, (ii) a part where it requires a lot of memory. However, such a two step scenario seems to occur only rarely.

In particular in late Mai and early June of 2013 there was a phase where the cluster was used exhaustively and where the memory overestimation by the individual users was a severe issue. Consequently, to avoid such situations in the future, it seems to be necessary to point out the benefit of proper memory allocation from time to time.


The two plots below summarize some data that quantify the memory usage for the 130 standard nodes on HERO. The data was collected since 18 June 2013, where, once per working day, the memory usage of the nodes was monitored.

File:MemDissipation raw mem.pdf