Information on used Resources

From HPC users
Revision as of 11:46, 7 March 2017 by Fajen (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Getting Information about Job Resources

It can be useful to know how many resources a job has used, for example, to adjust the requested resources of a similar following job. The sacct-command can be used for this purpose. Information about a specific job are obtained with the option -j <job-id>, e.g.:

$ sacct -j 21400
 JobID JobName Partition Account AllocCPUS State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
21400 g09test carl.p 8 COMPLETED 0:0 
21400.batch batch 8 COMPLETED 0:0

The command returns information about the job itself (first line) and individual job steps (following lines). A job step is defined by the srun-command used within a job script. In addition, the job script itself is shown as a special step in the line <job-id>.batch (in the example no srun-job steps are present).

To get more detail information you can use the --format=-option or alternetily -o-option. A useful command is

$ sacct -j 21400 --format=JobID,User,Node,AllocTRES%30,Elapsed,MaxRSS
 JobID User NodeList AllocTRES Elapsed MaxRSS 
------------ --------- --------------- ------------------------------ ---------- ---------- 
21400 abcd1234 mpcl001 cpu=8,mem=3994M,node=1 00:02:11 
21400.batch mpcl001 cpu=8,mem=3994M,node=1 00:02:11 4048152K 

which lists the JobID, the user name, a node list, the resources allocated for the job, the elapsed time (wallclock time), and the memory used (MaxRSS). In the example, the used memory of 4048152K (=3953.3M) is just a little smaller than the request amount of 3994M.

The --format=-option can be extended with additional fields as desired. The output field size can be modified using %n as above for AllocTRES. A complete list of fields is given by the command:

$ sacct -e
AllocCPUS AllocGRES AllocNodes AllocTRES 
Account AssocID AveCPU AveCPUFreq 
AveDiskRead AveDiskWrite AvePages AveRSS 
AveVMSize BlockID Cluster Comment 
ConsumedEnergy ConsumedEnergyRaw CPUTime CPUTimeRAW 
DerivedExitCode Elapsed Eligible End 
ExitCode GID Group JobID 
JobIDRaw JobName Layout MaxDiskRead 
MaxDiskReadNode MaxDiskReadTask MaxDiskWrite MaxDiskWriteNode 
MaxDiskWriteTask MaxPages MaxPagesNode MaxPagesTask 
MaxRSS MaxRSSNode MaxRSSTask MaxVMSize 
MaxVMSizeNode MaxVMSizeTask MinCPU MinCPUNode 
MinCPUTask NCPUS NNodes NodeList 
NTasks Priority Partition QOS 
QOSRAW ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax 
ReqCPUFreqGov ReqCPUS ReqGRES ReqMem 
ReqNodes ReqTRES Reservation ReservationId 
Reserved ResvCPU ResvCPURAW Start 
State Submit Suspended SystemCPU 
Timelimit TotalCPU UID User 
UserCPU WCKey WCKeyID 

Some of the more interesting ones are highlighted in bold. Unlike SGE's qacct sacct works for running as well, only some information may not be available until after the job is completed.