Difference between revisions of "Information on used Resources"

From HPC users
Jump to navigationJump to search
Line 17: Line 17:
  21400        abcd1234        mpcl001        cpu=8,mem=3994M,node=1  00:02:11             
  21400        abcd1234        mpcl001        cpu=8,mem=3994M,node=1  00:02:11             
  21400.batch                    mpcl001        cpu=8,mem=3994M,node=1  00:02:11  4048152K  
  21400.batch                    mpcl001        cpu=8,mem=3994M,node=1  00:02:11  4048152K  
which lists the JobID, the user name, a node list, the resources allocated for the job, the elapsed time (wallclock time), and the memory used (MaxRSS). In the example, the used memory of 4048152K (=3953.3M) is just a little smaller than the request amount of 3994M.
which lists the JobID, the user name, a node list, the resources allocated for the job, the elapsed time (wallclock time), and the memory used (MaxRSS). In the example, the used memory of 4048152K (=3953.3M) is just a little smaller than the request amount of 3994M.
 
The tt>--format=</tt>-option can be extended with additional fields as desired. The output field size can be modified using <tt>%n</tt> as above for <tt>AllocTRES</tt>. A complete list of fields is given by the command:
$ sacct -e
AllocCPUS        AllocGRES        AllocNodes        AllocTRES       
Account          AssocID          AveCPU            AveCPUFreq     
AveDiskRead      AveDiskWrite      AvePages          AveRSS         
AveVMSize        BlockID          Cluster          Comment         
ConsumedEnergy    ConsumedEnergyRaw CPUTime          CPUTimeRAW     
DerivedExitCode  Elapsed          Eligible          End             
ExitCode          GID              Group            JobID           
JobIDRaw          JobName          Layout            MaxDiskRead     
MaxDiskReadNode  MaxDiskReadTask  MaxDiskWrite      MaxDiskWriteNode
MaxDiskWriteTask  MaxPages          MaxPagesNode      MaxPagesTask   
MaxRSS            MaxRSSNode        MaxRSSTask        MaxVMSize       
MaxVMSizeNode    MaxVMSizeTask    MinCPU            MinCPUNode     
MinCPUTask        NCPUS            NNodes            NodeList       
NTasks            Priority          Partition        QOS             
QOSRAW            ReqCPUFreq        ReqCPUFreqMin    ReqCPUFreqMax   
ReqCPUFreqGov    ReqCPUS          ReqGRES          ReqMem         
ReqNodes          ReqTRES          Reservation      ReservationId   
Reserved          ResvCPU          ResvCPURAW        Start           
State            Submit            Suspended        SystemCPU       
Timelimit        TotalCPU          UID              User           
UserCPU          WCKey            WCKeyID         
Some of the more interesting ones are highlighted in '''bold'''.

Revision as of 17:05, 27 February 2017

Getting Information about Job Resources

It can be useful to know how many resources a job has used, for example to adjust the requested resources of a similar following job. The sacct-command can be used for this purpose. Information about a specific job are obtained with the option -j <job-id>, e.g.:

$ sacct -j 21400
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
21400           g09test     carl.p                     8  COMPLETED      0:0 
21400.batch       batch                                8  COMPLETED      0:0

The command returns information about the job itself (first line) and individual job steps (following lines). A job step is defined by the srun-command used within a job script. In addition, the job script itself is shown as a special step in the line <job-id>.batch (in the example no srun-job steps are present).

To get more detail information you can use the --format=-option. A useful command is

$ sacct -j 21400 --format=JobID,User,Node,AllocTRES%30,Elapsed,MaxRSS
       JobID      User        NodeList                      AllocTRES    Elapsed     MaxRSS 
------------ --------- --------------- ------------------------------ ---------- ---------- 
21400         abcd1234         mpcl001         cpu=8,mem=3994M,node=1   00:02:11            
21400.batch                    mpcl001         cpu=8,mem=3994M,node=1   00:02:11   4048152K 

which lists the JobID, the user name, a node list, the resources allocated for the job, the elapsed time (wallclock time), and the memory used (MaxRSS). In the example, the used memory of 4048152K (=3953.3M) is just a little smaller than the request amount of 3994M.

The tt>--format=-option can be extended with additional fields as desired. The output field size can be modified using %n as above for AllocTRES. A complete list of fields is given by the command:

$ sacct -e
AllocCPUS         AllocGRES         AllocNodes        AllocTRES        
Account           AssocID           AveCPU            AveCPUFreq       
AveDiskRead       AveDiskWrite      AvePages          AveRSS           
AveVMSize         BlockID           Cluster           Comment          
ConsumedEnergy    ConsumedEnergyRaw CPUTime           CPUTimeRAW       
DerivedExitCode   Elapsed           Eligible          End              
ExitCode          GID               Group             JobID            
JobIDRaw          JobName           Layout            MaxDiskRead      
MaxDiskReadNode   MaxDiskReadTask   MaxDiskWrite      MaxDiskWriteNode 
MaxDiskWriteTask  MaxPages          MaxPagesNode      MaxPagesTask     
MaxRSS            MaxRSSNode        MaxRSSTask        MaxVMSize        
MaxVMSizeNode     MaxVMSizeTask     MinCPU            MinCPUNode       
MinCPUTask        NCPUS             NNodes            NodeList         
NTasks            Priority          Partition         QOS              
QOSRAW            ReqCPUFreq        ReqCPUFreqMin     ReqCPUFreqMax    
ReqCPUFreqGov     ReqCPUS           ReqGRES           ReqMem           
ReqNodes          ReqTRES           Reservation       ReservationId    
Reserved          ResvCPU           ResvCPURAW        Start            
State             Submit            Suspended         SystemCPU        
Timelimit         TotalCPU          UID               User             
UserCPU           WCKey             WCKeyID          

Some of the more interesting ones are highlighted in bold.