Difference between revisions of "Queues and resource allocation"

From HPC users
Jump to navigationJump to search
Line 99: Line 99:
   </nowiki>
   </nowiki>
Among the listed keywords there are a few that stand out:
Among the listed keywords there are a few that stand out:
* <code>pe_list</code>: specifies the list of [http://wiki.hpcuser.uni-oldenburg.de/index.php?title=SGE_Job_Management_(Queueing)_System#Parallel_environments_.28PEs.29| parallel environments] available for the queue.  
* <code>pe_list</code>: specifies the list of [[http://wiki.hpcuser.uni-oldenburg.de/index.php?title=SGE_Job_Management_(Queueing)_System#Parallel_environments_.28PEs.29 | parallel environments]] available for the queue.  
* <code>hostlist</code>: specifies the list of hosts on which the respective queue is implemented. Here, the name of the hostlist is <code>@mpcs</code>. You can view the full list by means of the command <code>qconf -shgrpl @mpcs</code>, where <code>-shgrpl</code> stands for show (<code>s</code>) host group (<code>hgrp</code>) list (<code>l</code>)
* <code>hostlist</code>: specifies the list of hosts on which the respective queue is implemented. Here, the name of the hostlist is <code>@mpcs</code>. You can view the full list by means of the command <code>qconf -shgrpl @mpcs</code>, where <code>-shgrpl</code> stands for show (<code>s</code>) host group (<code>hgrp</code>) list (<code>l</code>)
* <code>comples_values</code>:
* <code>comples_values</code>:

Revision as of 14:40, 26 August 2013

The thing about queues is that, in general, you don't have to worry about them. Ideally you only specify resources for the job you are about to submit. In doing so you provide enough information to the scheduler to decide in which queue the job belongs in. Hence, you explicitly allocate resources and implicitly choose a queue. However, in some cases, namely when it comes to the problem of running a job on, say, particular hardware components of the cluster, it is beneficial to know the resources that need to be allocated in order to access a proper queue running on that component.

Albeit you (as a user) should worry more about specifying resources instead of targeting queues it is useful to disentangle the relationship between certain queues that are implemented on the HPC system and the resources that need to be specified in order for the scheduler to address that queue. Also some of you might be familiar with the concept of queues and prefer to think in terms of them.

Listing all possible queues

Now, thinking in terms of queues, you might be interested to see which queues there are on the HPC system. Logged in to your HPC account, you obtain a full list of all possible queues a job might be placed in by typing the command qconf -sql. qconf is a grid engine configuration tool which, among other things, allows you to list existing queues and queue configurations. In casual terms, the sequence of options -sql demands: show (s) queue (q) list (l). As a result you might find the following list of queues:

 
cfd_him_long.q
cfd_him_shrt.q
cfd_lom_long.q
cfd_lom_serl.q
cfd_lom_shrt.q
cfd_xtr_expr.q
cfd_xtr_iact.q
glm_dlc_long.q
glm_dlc_shrt.q
glm_qdc_long.q
glm_qdc_shrt.q
mpc_big_long.q
mpc_big_shrt.q
mpc_std_long.q
mpc_std_shrt.q
mpc_xtr_ctrl.q
mpc_xtr_iact.q
mpc_xtr_subq.q
uv100_smp_long.q
uv100_smp_shrt.q
  

Obtaining elaborate information for a particular queue

So as to obtain more details about the configuration of a particular queue you just need to specify that queue. E.g. to get elaborate information on the queue mpc_std_shrt.q, just type qconf -sq mpc_std_shrt.q, which yields

 
qname                 mpc_std_shrt.q
hostlist              @mpcs
seq_no                10000,[mpcs001.mpinet.cluster=10001], \
                      [mpcs002.mpinet.cluster=10002], \
                      ...
                      [mpcs123.mpinet.cluster=10123], \
                      [mpcs124.mpinet.cluster=10124]
load_thresholds       np_load_avg=1.75,slots=0
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH
ckpt_list             NONE
pe_list               impi impi41 linda molcas mpich mpich2_mpd mpich2_smpd \
                      openmpi smp mdcs
rerun                 FALSE
slots                 12
tmpdir                /scratch
shell                 /bin/bash
prolog                root@/cm/shared/apps/sge/scripts/prolog_mpc.sh
epilog                root@/cm/shared/apps/sge/scripts/epilog_mpc.sh
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            herousers
xuser_lists           NONE
subordinate_list      NONE
complex_values        h_vmem=23G,h_fsize=800G,cluster=hero
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  192:0:0
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY
  

Among the listed keywords there are a few that stand out:

  • pe_list: specifies the list of [| parallel environments] available for the queue.
  • hostlist: specifies the list of hosts on which the respective queue is implemented. Here, the name of the hostlist is @mpcs. You can view the full list by means of the command qconf -shgrpl @mpcs, where -shgrpl stands for show (s) host group (hgrp) list (l)
  • comples_values:

Listing jobs on a particular queue

Long and short queues

Addressing particular hardware components/queues