Hi

While doing some statistics on efficient CPU usage, I realized that sacct is 
reporting inexplicable (at least for me) high values for TotalCPU, UserCPU and 
SystemCPU. Here is a simple example (each job step is a infinite while loop):


sacct -j 64338003 
--format=jobid,elapsed,ncpus,cputime,totalcpu,usercpu,systemcpu,nodelist

       JobID    Elapsed      NCPUS    CPUTime   TotalCPU    UserCPU  SystemCPU  
      NodeList

------------ ---------- ---------- ---------- ---------- ---------- ---------- 
---------------

64338003       00:02:29           4      00:09:56    13:19:41     13:19:36    
00:05.054          anode033

64338003.ba+   00:02:31        4      00:10:04    00:09.017    00:04.003  
00:05.014          anode033

64338003.ex+   00:02:30        4      00:10:00    00:00.001    00:00:00    
00:00.001          anode033

64338003.0     00:02:32          1      00:02:32    03:19:52     03:19:52    
00:00.013          anode033

64338003.1     00:02:32          1      00:02:32    03:19:54     03:19:54    
00:00.008          anode033

64338003.2     00:02:32          1      00:02:32    03:19:53     03:19:53    
00:00.010          anode033

64338003.3     00:02:32          1      00:02:32    03:19:52     03:19:52    
00:00.007          anode033


I would expect CPUTime to be the upper limit for TotalCPU.


Looking at cpuacct.stat for job step3:


cat /cgroup/cpuacct/slurm/uid_6994/job_64338003/step_3/cpuacct.stat

user 14902       (~149 = 00:02:29)

system 0


This value corresponds to the expected CPU usage of a single job step.


We are running Slurm 18.08.4 with

JobAcctGatherType=jobacct_gather/cgroup


Does anyone have an explanation for those high values reported by sacct?



Best,

Nico


Universitaet Bern
Abt. Informatikdienste

Nico Färber
High Performance Computing

Gesellschaftsstrasse 6
CH-3012 Bern
Raum 104
Tel. +41 (0)31 631 51 89

Reply via email to