Difference between revisions of "Information on used Resources"
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
== Getting Information about Job Resources == | == Getting Information about Job Resources == | ||
It can be useful to know how many resources a job has used, for example to adjust the requested resources of a similar following job. The <tt>sacct</tt>-command can be used for this purpose. Information about a specific job are obtained with the option <tt>-j <job-id></tt>, e.g.: | It can be useful to know how many resources a job has used, for example, to adjust the requested resources of a similar following job. The <tt>sacct</tt>-command can be used for this purpose. Information about a specific job are obtained with the option <tt>-j <job-id></tt>, e.g.: | ||
$ sacct -j 21400 | $ sacct -j 21400 | ||
JobID JobName Partition Account AllocCPUS State ExitCode | |||
------------ ---------- ---------- ---------- ---------- ---------- -------- | ------------ ---------- ---------- ---------- ---------- ---------- -------- | ||
21400 | 21400 g09test carl.p 8 COMPLETED 0:0 | ||
21400.batch | 21400.batch batch 8 COMPLETED 0:0 | ||
The command returns information about the job itself (first line) and individual job steps (following lines). A job step is defined by the <tt>srun</tt>-command used within a job script. In addition, the job script itself is shown as a special step in the line <tt><job-id>.batch</tt> (in the example no <tt>srun</tt>-job steps are present). | The command returns information about the job itself (first line) and individual job steps (following lines). A job step is defined by the <tt>srun</tt>-command used within a job script. In addition, the job script itself is shown as a special step in the line <tt><job-id>.batch</tt> (in the example no <tt>srun</tt>-job steps are present). | ||
To get more detail information you can use the <tt>--format=</tt>-option. A useful command is | To get more detail information you can use the <tt>--format=</tt>-option or alternetily <tt>-o</tt>-option. A useful command is | ||
$ sacct -j 21400 --format=JobID,User,Node,AllocTRES%30,Elapsed,MaxRSS | $ sacct -j 21400 --format=JobID,User,Node,AllocTRES%30,Elapsed,MaxRSS | ||
JobID User NodeList AllocTRES Elapsed MaxRSS | |||
------------ --------- --------------- ------------------------------ ---------- ---------- | ------------ --------- --------------- ------------------------------ ---------- ---------- | ||
21400 | 21400 abcd1234 mpcl001 cpu=8,mem=3994M,node=1 00:02:11 | ||
21400.batch | 21400.batch mpcl001 cpu=8,mem=3994M,node=1 00:02:11 4048152K | ||
which lists the JobID, the user name, a node list, the resources allocated for the job, the elapsed time (wallclock time), and the memory used (MaxRSS). In the example, the used memory of 4048152K (=3953.3M) is just a little smaller than the request amount of 3994M. | which lists the JobID, the user name, a node list, the resources allocated for the job, the elapsed time (wallclock time), and the memory used (MaxRSS). In the example, the used memory of 4048152K (=3953.3M) is just a little smaller than the request amount of 3994M. | ||
The tt>--format=</tt>-option can be extended with additional fields as desired. The output field size can be modified using <tt>%n</tt> as above for <tt>AllocTRES</tt>. A complete list of fields is given by the command: | The <tt>--format=</tt>-option can be extended with additional fields as desired. The output field size can be modified using <tt>%n</tt> as above for <tt>AllocTRES</tt>. A complete list of fields is given by the command: | ||
$ sacct -e | $ sacct -e | ||
AllocCPUS | AllocCPUS '''AllocGRES''' AllocNodes '''AllocTRES''' | ||
Account | Account AssocID AveCPU AveCPUFreq | ||
AveDiskRead | AveDiskRead AveDiskWrite AvePages '''AveRSS''' | ||
AveVMSize | AveVMSize BlockID Cluster Comment | ||
ConsumedEnergy | ConsumedEnergy ConsumedEnergyRaw '''CPUTime''' CPUTimeRAW | ||
DerivedExitCode | DerivedExitCode '''Elapsed''' Eligible End | ||
ExitCode | ExitCode GID Group '''JobID''' | ||
JobIDRaw | JobIDRaw '''JobName''' Layout MaxDiskRead | ||
MaxDiskReadNode | MaxDiskReadNode MaxDiskReadTask MaxDiskWrite MaxDiskWriteNode | ||
MaxDiskWriteTask | MaxDiskWriteTask MaxPages MaxPagesNode MaxPagesTask | ||
'''MaxRSS''' | '''MaxRSS''' '''MaxRSSNode''' '''MaxRSSTask''' MaxVMSize | ||
MaxVMSizeNode | MaxVMSizeNode MaxVMSizeTask MinCPU MinCPUNode | ||
MinCPUTask | MinCPUTask '''NCPUS''' '''NNodes''' NodeList | ||
'''NTasks''' | '''NTasks''' Priority '''Partition''' QOS | ||
QOSRAW | QOSRAW ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax | ||
ReqCPUFreqGov | ReqCPUFreqGov ReqCPUS ReqGRES ReqMem | ||
ReqNodes | ReqNodes ReqTRES Reservation ReservationId | ||
Reserved | Reserved ResvCPU ResvCPURAW Start | ||
'''State''' | '''State''' Submit Suspended SystemCPU | ||
'''Timelimit''' | '''Timelimit''' TotalCPU UID '''User''' | ||
UserCPU | UserCPU WCKey WCKeyID | ||
Some of the more interesting ones are highlighted in '''bold'''. | Some of the more interesting ones are highlighted in '''bold'''. Unlike SGE's qacct <tt>sacct</tt> works for running as well, only some information may not be available until after the job is completed. |
Latest revision as of 11:46, 7 March 2017
Getting Information about Job Resources
It can be useful to know how many resources a job has used, for example, to adjust the requested resources of a similar following job. The sacct-command can be used for this purpose. Information about a specific job are obtained with the option -j <job-id>, e.g.:
$ sacct -j 21400 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 21400 g09test carl.p 8 COMPLETED 0:0 21400.batch batch 8 COMPLETED 0:0
The command returns information about the job itself (first line) and individual job steps (following lines). A job step is defined by the srun-command used within a job script. In addition, the job script itself is shown as a special step in the line <job-id>.batch (in the example no srun-job steps are present).
To get more detail information you can use the --format=-option or alternetily -o-option. A useful command is
$ sacct -j 21400 --format=JobID,User,Node,AllocTRES%30,Elapsed,MaxRSS JobID User NodeList AllocTRES Elapsed MaxRSS ------------ --------- --------------- ------------------------------ ---------- ---------- 21400 abcd1234 mpcl001 cpu=8,mem=3994M,node=1 00:02:11 21400.batch mpcl001 cpu=8,mem=3994M,node=1 00:02:11 4048152K
which lists the JobID, the user name, a node list, the resources allocated for the job, the elapsed time (wallclock time), and the memory used (MaxRSS). In the example, the used memory of 4048152K (=3953.3M) is just a little smaller than the request amount of 3994M.
The --format=-option can be extended with additional fields as desired. The output field size can be modified using %n as above for AllocTRES. A complete list of fields is given by the command:
$ sacct -e AllocCPUS AllocGRES AllocNodes AllocTRES Account AssocID AveCPU AveCPUFreq AveDiskRead AveDiskWrite AvePages AveRSS AveVMSize BlockID Cluster Comment ConsumedEnergy ConsumedEnergyRaw CPUTime CPUTimeRAW DerivedExitCode Elapsed Eligible End ExitCode GID Group JobID JobIDRaw JobName Layout MaxDiskRead MaxDiskReadNode MaxDiskReadTask MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask MaxPages MaxPagesNode MaxPagesTask MaxRSS MaxRSSNode MaxRSSTask MaxVMSize MaxVMSizeNode MaxVMSizeTask MinCPU MinCPUNode MinCPUTask NCPUS NNodes NodeList NTasks Priority Partition QOS QOSRAW ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov ReqCPUS ReqGRES ReqMem ReqNodes ReqTRES Reservation ReservationId Reserved ResvCPU ResvCPURAW Start State Submit Suspended SystemCPU Timelimit TotalCPU UID User UserCPU WCKey WCKeyID
Some of the more interesting ones are highlighted in bold. Unlike SGE's qacct sacct works for running as well, only some information may not be available until after the job is completed.