Difference between revisions of "Queues and resource allocation"
Line 99: | Line 99: | ||
</nowiki> | </nowiki> | ||
Among the listed keywords there are a few that stand out: | Among the listed keywords there are a few that stand out: | ||
* <code>pe_list</code>: specifies the list of [[ | * <code>pe_list</code>: specifies the list of [[SGE_Job_Management_(Queueing)_System#Parallel_environments_.28PEs.29 | parallel environments]] available for the queue. | ||
* <code>hostlist</code>: specifies the list of hosts on which the respective queue is implemented. Here, the name of the hostlist is <code>@mpcs</code>. You can view the full list by means of the command <code>qconf -shgrpl @mpcs</code>, where <code>-shgrpl</code> stands for show (<code>s</code>) host group (<code>hgrp</code>) list (<code>l</code>) | * <code>hostlist</code>: specifies the list of hosts on which the respective queue is implemented. Here, the name of the hostlist is <code>@mpcs</code>. You can view the full list by means of the command <code>qconf -shgrpl @mpcs</code>, where <code>-shgrpl</code> stands for show (<code>s</code>) host group (<code>hgrp</code>) list (<code>l</code>) | ||
* <code>comples_values</code>: | * <code>comples_values</code>: |
Revision as of 14:40, 26 August 2013
The thing about queues is that, in general, you don't have to worry about them. Ideally you only specify resources for the job you are about to submit. In doing so you provide enough information to the scheduler to decide in which queue the job belongs in. Hence, you explicitly allocate resources and implicitly choose a queue. However, in some cases, namely when it comes to the problem of running a job on, say, particular hardware components of the cluster, it is beneficial to know the resources that need to be allocated in order to access a proper queue running on that component.
Albeit you (as a user) should worry more about specifying resources instead of targeting queues it is useful to disentangle the relationship between certain queues that are implemented on the HPC system and the resources that need to be specified in order for the scheduler to address that queue. Also some of you might be familiar with the concept of queues and prefer to think in terms of them.
Listing all possible queues
Now, thinking in terms of queues, you might be interested to see which queues there are on the HPC system. Logged in to your HPC account, you obtain a full list of all
possible queues a job might be placed in by typing the command qconf -sql
. qconf
is a grid engine configuration tool which, among other
things, allows you to list existing queues and queue configurations. In casual terms, the sequence of options -sql
demands: show (s
) queue (q
) list (l
).
As a result you might find the following list of queues:
cfd_him_long.q cfd_him_shrt.q cfd_lom_long.q cfd_lom_serl.q cfd_lom_shrt.q cfd_xtr_expr.q cfd_xtr_iact.q glm_dlc_long.q glm_dlc_shrt.q glm_qdc_long.q glm_qdc_shrt.q mpc_big_long.q mpc_big_shrt.q mpc_std_long.q mpc_std_shrt.q mpc_xtr_ctrl.q mpc_xtr_iact.q mpc_xtr_subq.q uv100_smp_long.q uv100_smp_shrt.q
Obtaining elaborate information for a particular queue
So as to obtain more details about the configuration of a particular queue you just need to specify that queue. E.g. to get elaborate
information on the queue mpc_std_shrt.q
, just type qconf -sq mpc_std_shrt.q
, which yields
qname mpc_std_shrt.q hostlist @mpcs seq_no 10000,[mpcs001.mpinet.cluster=10001], \ [mpcs002.mpinet.cluster=10002], \ ... [mpcs123.mpinet.cluster=10123], \ [mpcs124.mpinet.cluster=10124] load_thresholds np_load_avg=1.75,slots=0 suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:05:00 processors UNDEFINED qtype BATCH ckpt_list NONE pe_list impi impi41 linda molcas mpich mpich2_mpd mpich2_smpd \ openmpi smp mdcs rerun FALSE slots 12 tmpdir /scratch shell /bin/bash prolog root@/cm/shared/apps/sge/scripts/prolog_mpc.sh epilog root@/cm/shared/apps/sge/scripts/epilog_mpc.sh shell_start_mode posix_compliant starter_method NONE suspend_method NONE resume_method NONE terminate_method NONE notify 00:00:60 owner_list NONE user_lists herousers xuser_lists NONE subordinate_list NONE complex_values h_vmem=23G,h_fsize=800G,cluster=hero projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt 192:0:0 s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY
Among the listed keywords there are a few that stand out:
pe_list
: specifies the list of parallel environments available for the queue.hostlist
: specifies the list of hosts on which the respective queue is implemented. Here, the name of the hostlist is@mpcs
. You can view the full list by means of the commandqconf -shgrpl @mpcs
, where-shgrpl
stands for show (s
) host group (hgrp
) list (l
)comples_values
: