Difference between revisions of "Information on used Resources"

From HPC users
Jump to navigationJump to search
Line 21: Line 21:
The tt>--format=</tt>-option can be extended with additional fields as desired. The output field size can be modified using <tt>%n</tt> as above for <tt>AllocTRES</tt>. A complete list of fields is given by the command:
The tt>--format=</tt>-option can be extended with additional fields as desired. The output field size can be modified using <tt>%n</tt> as above for <tt>AllocTRES</tt>. A complete list of fields is given by the command:
  $ sacct -e
  $ sacct -e
  AllocCPUS        AllocGRES        AllocNodes        AllocTRES         
  AllocCPUS        '''AllocGRES'''         AllocNodes        '''AllocTRES'''        
  Account          AssocID          AveCPU            AveCPUFreq       
  Account          AssocID          AveCPU            AveCPUFreq       
  AveDiskRead      AveDiskWrite      AvePages          AveRSS           
  AveDiskRead      AveDiskWrite      AvePages          '''AveRSS'''            
  AveVMSize        BlockID          Cluster          Comment           
  AveVMSize        BlockID          Cluster          Comment           
  ConsumedEnergy    ConsumedEnergyRaw CPUTime          CPUTimeRAW       
  ConsumedEnergy    ConsumedEnergyRaw '''CPUTime'''           CPUTimeRAW       
  DerivedExitCode  Elapsed          Eligible          End               
  DerivedExitCode  '''Elapsed'''           Eligible          End               
  ExitCode          GID              Group            JobID             
  ExitCode          GID              Group            '''JobID'''            
  JobIDRaw          JobName          Layout            MaxDiskRead       
  JobIDRaw          '''JobName'''           Layout            MaxDiskRead       
  MaxDiskReadNode  MaxDiskReadTask  MaxDiskWrite      MaxDiskWriteNode  
  MaxDiskReadNode  MaxDiskReadTask  MaxDiskWrite      MaxDiskWriteNode  
  MaxDiskWriteTask  MaxPages          MaxPagesNode      MaxPagesTask     
  MaxDiskWriteTask  MaxPages          MaxPagesNode      MaxPagesTask     
  MaxRSS            MaxRSSNode        MaxRSSTask        MaxVMSize         
  '''MaxRSS'''           '''MaxRSSNode'''       '''MaxRSSTask'''       MaxVMSize         
  MaxVMSizeNode    MaxVMSizeTask    MinCPU            MinCPUNode       
  MaxVMSizeNode    MaxVMSizeTask    MinCPU            MinCPUNode       
  MinCPUTask        NCPUS            NNodes            NodeList         
  MinCPUTask        '''NCPUS'''             '''NNodes'''           NodeList         
  NTasks            Priority          Partition        QOS               
  '''NTasks'''           Priority          '''Partition'''         QOS               
  QOSRAW            ReqCPUFreq        ReqCPUFreqMin    ReqCPUFreqMax     
  QOSRAW            ReqCPUFreq        ReqCPUFreqMin    ReqCPUFreqMax     
  ReqCPUFreqGov    ReqCPUS          ReqGRES          ReqMem           
  ReqCPUFreqGov    ReqCPUS          ReqGRES          ReqMem           
  ReqNodes          ReqTRES          Reservation      ReservationId     
  ReqNodes          ReqTRES          Reservation      ReservationId     
  Reserved          ResvCPU          ResvCPURAW        Start             
  Reserved          ResvCPU          ResvCPURAW        Start             
  State            Submit            Suspended        SystemCPU         
  '''State'''             Submit            Suspended        SystemCPU         
  Timelimit        TotalCPU          UID              User             
  '''Timelimit'''         TotalCPU          UID              '''User'''              
  UserCPU          WCKey            WCKeyID           
  UserCPU          WCKey            WCKeyID           
Some of the more interesting ones are highlighted in '''bold'''.
Some of the more interesting ones are highlighted in '''bold'''.

Revision as of 17:09, 27 February 2017

Getting Information about Job Resources

It can be useful to know how many resources a job has used, for example to adjust the requested resources of a similar following job. The sacct-command can be used for this purpose. Information about a specific job are obtained with the option -j <job-id>, e.g.:

$ sacct -j 21400
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
21400           g09test     carl.p                     8  COMPLETED      0:0 
21400.batch       batch                                8  COMPLETED      0:0

The command returns information about the job itself (first line) and individual job steps (following lines). A job step is defined by the srun-command used within a job script. In addition, the job script itself is shown as a special step in the line <job-id>.batch (in the example no srun-job steps are present).

To get more detail information you can use the --format=-option. A useful command is

$ sacct -j 21400 --format=JobID,User,Node,AllocTRES%30,Elapsed,MaxRSS
       JobID      User        NodeList                      AllocTRES    Elapsed     MaxRSS 
------------ --------- --------------- ------------------------------ ---------- ---------- 
21400         abcd1234         mpcl001         cpu=8,mem=3994M,node=1   00:02:11            
21400.batch                    mpcl001         cpu=8,mem=3994M,node=1   00:02:11   4048152K 

which lists the JobID, the user name, a node list, the resources allocated for the job, the elapsed time (wallclock time), and the memory used (MaxRSS). In the example, the used memory of 4048152K (=3953.3M) is just a little smaller than the request amount of 3994M.

The tt>--format=-option can be extended with additional fields as desired. The output field size can be modified using %n as above for AllocTRES. A complete list of fields is given by the command:

$ sacct -e
AllocCPUS         AllocGRES         AllocNodes        AllocTRES        
Account           AssocID           AveCPU            AveCPUFreq       
AveDiskRead       AveDiskWrite      AvePages          AveRSS           
AveVMSize         BlockID           Cluster           Comment          
ConsumedEnergy    ConsumedEnergyRaw CPUTime           CPUTimeRAW       
DerivedExitCode   Elapsed           Eligible          End              
ExitCode          GID               Group             JobID            
JobIDRaw          JobName           Layout            MaxDiskRead      
MaxDiskReadNode   MaxDiskReadTask   MaxDiskWrite      MaxDiskWriteNode 
MaxDiskWriteTask  MaxPages          MaxPagesNode      MaxPagesTask     
MaxRSS            MaxRSSNode        MaxRSSTask        MaxVMSize        
MaxVMSizeNode     MaxVMSizeTask     MinCPU            MinCPUNode       
MinCPUTask        NCPUS             NNodes            NodeList         
NTasks            Priority          Partition         QOS              
QOSRAW            ReqCPUFreq        ReqCPUFreqMin     ReqCPUFreqMax    
ReqCPUFreqGov     ReqCPUS           ReqGRES           ReqMem           
ReqNodes          ReqTRES           Reservation       ReservationId    
Reserved          ResvCPU           ResvCPURAW        Start            
State             Submit            Suspended         SystemCPU        
Timelimit         TotalCPU          UID               User             
UserCPU           WCKey             WCKeyID          

Some of the more interesting ones are highlighted in bold.