Information on used Resources

From HPC users
Jump to navigationJump to search

Getting Information about Job Resources

It can be useful to know how many resources a job has used, for example to adjust the requested resources of a similar following job. The sacct-command can be used for this purpose. Information about a specific job are obtained with the option -j <job-id>, e.g.:

$ sacct -j 21400
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
21400           g09test     carl.p                     8  COMPLETED      0:0 
21400.batch       batch                                8  COMPLETED      0:0

The command returns information about the job itself (first line) and individual job steps (following lines). A job step is defined by the srun-command used within a job script. In addition, the job script itself is shown as a special step in the line <job-id>.batch (in the example no srun-job steps are present).

To get more detail information you can use the --format=-option. A useful command is

$ sacct -j 21400 --format=JobID,User,Node,AllocTRES%30,Elapsed,MaxRSS
       JobID      User        NodeList                      AllocTRES    Elapsed     MaxRSS 
------------ --------- --------------- ------------------------------ ---------- ---------- 
21400         abcd1234         mpcl001         cpu=8,mem=3994M,node=1   00:02:11            
21400.batch                    mpcl001         cpu=8,mem=3994M,node=1   00:02:11   4048152K 

which lists the JobID, the user name, a node list, the resources allocated for the job, the elapsed time (wallclock time), and the memory used (MaxRSS). In the example, the used memory of 4048152K (=3953.3M) is just a little smaller than the request amount of 3994M.

The tt>--format=-option can be extended with additional fields as desired. The output field size can be modified using %n as above for AllocTRES. A complete list of fields is given by the command:

$ sacct -e
AllocCPUS         AllocGRES         AllocNodes        AllocTRES        
Account           AssocID           AveCPU            AveCPUFreq       
AveDiskRead       AveDiskWrite      AvePages          AveRSS           
AveVMSize         BlockID           Cluster           Comment          
ConsumedEnergy    ConsumedEnergyRaw CPUTime           CPUTimeRAW       
DerivedExitCode   Elapsed           Eligible          End              
ExitCode          GID               Group             JobID            
JobIDRaw          JobName           Layout            MaxDiskRead      
MaxDiskReadNode   MaxDiskReadTask   MaxDiskWrite      MaxDiskWriteNode 
MaxDiskWriteTask  MaxPages          MaxPagesNode      MaxPagesTask     
MaxRSS            MaxRSSNode        MaxRSSTask        MaxVMSize        
MaxVMSizeNode     MaxVMSizeTask     MinCPU            MinCPUNode       
MinCPUTask        NCPUS             NNodes            NodeList         
NTasks            Priority          Partition         QOS              
QOSRAW            ReqCPUFreq        ReqCPUFreqMin     ReqCPUFreqMax    
ReqCPUFreqGov     ReqCPUS           ReqGRES           ReqMem           
ReqNodes          ReqTRES           Reservation       ReservationId    
Reserved          ResvCPU           ResvCPURAW        Start            
State             Submit            Suspended         SystemCPU        
Timelimit         TotalCPU          UID               User             
UserCPU           WCKey             WCKeyID          

Some of the more interesting ones are highlighted in bold. Unlike SGE's qacct sacct works for running as well, only some information may not be available until after the job is completed.