Difference between revisions of "Information on used Resources"

From HPC users
Jump to navigationJump to search
 
Line 1: Line 1:
== Getting Information about Job Resources ==
== Getting Information about Job Resources ==


It can be useful to know how many resources a job has used, for example to adjust the requested resources of a similar following job. The <tt>sacct</tt>-command can be used for this purpose. Information about a specific job are obtained with the option <tt>-j <job-id></tt>, e.g.:  
It can be useful to know how many resources a job has used, for example, to adjust the requested resources of a similar following job. The <tt>sacct</tt>-command can be used for this purpose. Information about a specific job are obtained with the option <tt>-j <job-id></tt>, e.g.:  


  $ sacct -j 21400
  $ sacct -j 21400
        JobID   JobName Partition   Account AllocCPUS     State ExitCode  
  JobID JobName Partition Account AllocCPUS State ExitCode  
  ------------ ---------- ---------- ---------- ---------- ---------- --------  
  ------------ ---------- ---------- ---------- ---------- ---------- --------  
  21400           g09test     carl.p                     8 COMPLETED     0:0  
  21400 g09test carl.p 8 COMPLETED 0:0  
  21400.batch       batch                               8 COMPLETED     0:0
  21400.batch batch 8 COMPLETED 0:0


The command returns information about the job itself (first line) and individual job steps (following lines). A job step is defined by the <tt>srun</tt>-command used within a job script. In addition, the job script itself is shown as a special step in the line <tt><job-id>.batch</tt> (in the example no <tt>srun</tt>-job steps are present).
The command returns information about the job itself (first line) and individual job steps (following lines). A job step is defined by the <tt>srun</tt>-command used within a job script. In addition, the job script itself is shown as a special step in the line <tt><job-id>.batch</tt> (in the example no <tt>srun</tt>-job steps are present).


To get more detail information you can use the <tt>--format=</tt>-option. A useful command is
To get more detail information you can use the <tt>--format=</tt>-option or alternetily <tt>-o</tt>-option. A useful command is
  $ sacct -j 21400 --format=JobID,User,Node,AllocTRES%30,Elapsed,MaxRSS
  $ sacct -j 21400 --format=JobID,User,Node,AllocTRES%30,Elapsed,MaxRSS
        JobID     User       NodeList                     AllocTRES   Elapsed     MaxRSS  
  JobID User NodeList AllocTRES Elapsed MaxRSS  
  ------------ --------- --------------- ------------------------------ ---------- ----------  
  ------------ --------- --------------- ------------------------------ ---------- ----------  
  21400         abcd1234         mpcl001         cpu=8,mem=3994M,node=1   00:02:11          
  21400 abcd1234 mpcl001 cpu=8,mem=3994M,node=1 00:02:11  
  21400.batch                   mpcl001         cpu=8,mem=3994M,node=1   00:02:11   4048152K  
  21400.batch mpcl001 cpu=8,mem=3994M,node=1 00:02:11 4048152K  
which lists the JobID, the user name, a node list, the resources allocated for the job, the elapsed time (wallclock time), and the memory used (MaxRSS). In the example, the used memory of 4048152K (=3953.3M) is just a little smaller than the request amount of 3994M.  
which lists the JobID, the user name, a node list, the resources allocated for the job, the elapsed time (wallclock time), and the memory used (MaxRSS). In the example, the used memory of 4048152K (=3953.3M) is just a little smaller than the request amount of 3994M.  


The tt>--format=</tt>-option can be extended with additional fields as desired. The output field size can be modified using <tt>%n</tt> as above for <tt>AllocTRES</tt>. A complete list of fields is given by the command:
The <tt>--format=</tt>-option can be extended with additional fields as desired. The output field size can be modified using <tt>%n</tt> as above for <tt>AllocTRES</tt>. A complete list of fields is given by the command:
  $ sacct -e
  $ sacct -e
  AllocCPUS         '''AllocGRES'''         AllocNodes       '''AllocTRES'''      
  AllocCPUS '''AllocGRES''' AllocNodes '''AllocTRES'''  
  Account           AssocID           AveCPU           AveCPUFreq      
  Account AssocID AveCPU AveCPUFreq  
  AveDiskRead       AveDiskWrite     AvePages         '''AveRSS'''          
  AveDiskRead AveDiskWrite AvePages '''AveRSS'''  
  AveVMSize         BlockID           Cluster           Comment        
  AveVMSize BlockID Cluster Comment  
  ConsumedEnergy   ConsumedEnergyRaw '''CPUTime'''           CPUTimeRAW      
  ConsumedEnergy ConsumedEnergyRaw '''CPUTime''' CPUTimeRAW  
  DerivedExitCode   '''Elapsed'''           Eligible         End            
  DerivedExitCode '''Elapsed''' Eligible End  
  ExitCode         GID               Group             '''JobID'''          
  ExitCode GID Group '''JobID'''  
  JobIDRaw         '''JobName'''           Layout           MaxDiskRead    
  JobIDRaw '''JobName''' Layout MaxDiskRead  
  MaxDiskReadNode   MaxDiskReadTask   MaxDiskWrite     MaxDiskWriteNode  
  MaxDiskReadNode MaxDiskReadTask MaxDiskWrite MaxDiskWriteNode  
  MaxDiskWriteTask MaxPages         MaxPagesNode     MaxPagesTask    
  MaxDiskWriteTask MaxPages MaxPagesNode MaxPagesTask  
  '''MaxRSS'''           '''MaxRSSNode'''       '''MaxRSSTask'''       MaxVMSize      
  '''MaxRSS''' '''MaxRSSNode''' '''MaxRSSTask''' MaxVMSize  
  MaxVMSizeNode     MaxVMSizeTask     MinCPU           MinCPUNode      
  MaxVMSizeNode MaxVMSizeTask MinCPU MinCPUNode  
  MinCPUTask       '''NCPUS'''             '''NNodes'''           NodeList        
  MinCPUTask '''NCPUS''' '''NNodes''' NodeList  
  '''NTasks'''           Priority         '''Partition'''         QOS            
  '''NTasks''' Priority '''Partition''' QOS  
  QOSRAW           ReqCPUFreq       ReqCPUFreqMin     ReqCPUFreqMax  
  QOSRAW ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax  
  ReqCPUFreqGov     ReqCPUS           ReqGRES           ReqMem          
  ReqCPUFreqGov ReqCPUS ReqGRES ReqMem  
  ReqNodes         ReqTRES           Reservation       ReservationId  
  ReqNodes ReqTRES Reservation ReservationId  
  Reserved         ResvCPU           ResvCPURAW       Start          
  Reserved ResvCPU ResvCPURAW Start  
  '''State'''             Submit           Suspended         SystemCPU      
  '''State''' Submit Suspended SystemCPU  
  '''Timelimit'''         TotalCPU         UID               '''User'''            
  '''Timelimit''' TotalCPU UID '''User'''  
  UserCPU           WCKey             WCKeyID        
  UserCPU WCKey WCKeyID  
Some of the more interesting ones are highlighted in '''bold'''. Unlike SGE's qacct <tt>sacct</tt> works for running as well, only some information may not be available until after the job is completed.
Some of the more interesting ones are highlighted in '''bold'''. Unlike SGE's qacct <tt>sacct</tt> works for running as well, only some information may not be available until after the job is completed.

Latest revision as of 11:46, 7 March 2017

Getting Information about Job Resources

It can be useful to know how many resources a job has used, for example, to adjust the requested resources of a similar following job. The sacct-command can be used for this purpose. Information about a specific job are obtained with the option -j <job-id>, e.g.:

$ sacct -j 21400
 JobID JobName Partition Account AllocCPUS State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
21400 g09test carl.p 8 COMPLETED 0:0 
21400.batch batch 8 COMPLETED 0:0

The command returns information about the job itself (first line) and individual job steps (following lines). A job step is defined by the srun-command used within a job script. In addition, the job script itself is shown as a special step in the line <job-id>.batch (in the example no srun-job steps are present).

To get more detail information you can use the --format=-option or alternetily -o-option. A useful command is

$ sacct -j 21400 --format=JobID,User,Node,AllocTRES%30,Elapsed,MaxRSS
 JobID User NodeList AllocTRES Elapsed MaxRSS 
------------ --------- --------------- ------------------------------ ---------- ---------- 
21400 abcd1234 mpcl001 cpu=8,mem=3994M,node=1 00:02:11 
21400.batch mpcl001 cpu=8,mem=3994M,node=1 00:02:11 4048152K 

which lists the JobID, the user name, a node list, the resources allocated for the job, the elapsed time (wallclock time), and the memory used (MaxRSS). In the example, the used memory of 4048152K (=3953.3M) is just a little smaller than the request amount of 3994M.

The --format=-option can be extended with additional fields as desired. The output field size can be modified using %n as above for AllocTRES. A complete list of fields is given by the command:

$ sacct -e
AllocCPUS AllocGRES AllocNodes AllocTRES 
Account AssocID AveCPU AveCPUFreq 
AveDiskRead AveDiskWrite AvePages AveRSS 
AveVMSize BlockID Cluster Comment 
ConsumedEnergy ConsumedEnergyRaw CPUTime CPUTimeRAW 
DerivedExitCode Elapsed Eligible End 
ExitCode GID Group JobID 
JobIDRaw JobName Layout MaxDiskRead 
MaxDiskReadNode MaxDiskReadTask MaxDiskWrite MaxDiskWriteNode 
MaxDiskWriteTask MaxPages MaxPagesNode MaxPagesTask 
MaxRSS MaxRSSNode MaxRSSTask MaxVMSize 
MaxVMSizeNode MaxVMSizeTask MinCPU MinCPUNode 
MinCPUTask NCPUS NNodes NodeList 
NTasks Priority Partition QOS 
QOSRAW ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax 
ReqCPUFreqGov ReqCPUS ReqGRES ReqMem 
ReqNodes ReqTRES Reservation ReservationId 
Reserved ResvCPU ResvCPURAW Start 
State Submit Suspended SystemCPU 
Timelimit TotalCPU UID User 
UserCPU WCKey WCKeyID 

Some of the more interesting ones are highlighted in bold. Unlike SGE's qacct sacct works for running as well, only some information may not be available until after the job is completed.