Re: [slurm-users] slurm bank and sreport tres minute usage problem

Paul Raines Fri, 12 Mar 2021 11:27:26 -0800


Very new to SLURM and have not used sreport before so I decided to
try your searches myself to see what they do.


I am running 20.11.3 and it seems to match the data for me for a very
simple case I tested that I could "eyeball"

Looking just at the day 2021-03-09 for user mu40 on account lcn

# sreport -t minutes -T CPU -nP cluster \
  AccountUtilizationByUser start='2021-03-09' end='2021-03-10' \
  account=lcn format=login,used
|40333
cx88|33835
mu40|6498

# sreport -t minutes -T gres/gpu -nP cluster \
  AccountUtilizationByUser start='2021-03-09' end='2021-03-10' \
  account=lcn format=login,used
|13070
cx88|9646
mu40|3425

# sacct --user=mu40 --starttime=2021-03-09 --endtime=2021-03-10 \
  --account=lcn -o jobid,start,end,elapsed,alloctres%80

       JobID               Start                 End    Elapsed
AllocTRES
------------ ------------------- ------------------- ----------
-----------------------------------------------------
190682       2021-03-05T16:25:55 2021-03-12T09:20:52 6-16:54:57
billing=10,cpu=3,gres/gpu=2,mem=24G,node=1
190682.batch 2021-03-05T16:25:55 2021-03-12T09:20:53 6-16:54:58
cpu=3,gres/gpu=2,mem=24G,node=1
190682.exte+ 2021-03-05T16:25:55 2021-03-12T09:20:52 6-16:54:57
billing=10,cpu=3,gres/gpu=2,mem=24G,node=1
201123       2021-03-09T14:55:20 2021-03-09T14:55:23   00:00:03
billing=9,cpu=4,gres/gpu=1,mem=96G,node=1
201123.exte+ 2021-03-09T14:55:20 2021-03-09T14:55:23   00:00:03
billing=9,cpu=4,gres/gpu=1,mem=96G,node=1
201123.0     2021-03-09T14:55:20 2021-03-09T14:55:23   00:00:03
cpu=4,gres/gpu=1,mem=96G,node=1
201124       2021-03-09T14:55:29 2021-03-10T08:13:07   17:17:38
billing=18,cpu=4,gres/gpu=1,mem=512G,node=1
201124.exte+ 2021-03-09T14:55:29 2021-03-10T08:13:07   17:17:38
billing=18,cpu=4,gres/gpu=1,mem=512G,node=1
201124.0     2021-03-09T14:55:29 2021-03-10T08:13:07   17:17:38
cpu=4,gres/gpu=1,mem=512G,node=1

So the first job used all 24 hours of that day, the 2nd just 3 seconds
(so ignore it) and the third about 9 hours and 5 minutes

CPU = 24*60*3+(9*60+5)*4 = 6500

GPU = 24*60*2+(9*60+5)*1 = 3425



-- Paul Raines (http://help.nmr.mgh.harvard.edu)



On Thu, 11 Mar 2021 11:03pm, Miguel Oliveira wrote:

Dear all,

Hope you can help me!
In our facility we support the users via projects that have time allocations. Given 
this we use a simple bank facility developed by us along the ideas of the old code 
https://jcftang.github.io/slurm-bank/ <https://jcftang.github.io/slurm-bank/>.
Our implementation differs because we have a QOS per project with a NoDecay 
flag. The basic commands used are:
- scontrol show assoc_mgr to read the limits,
- sacctmgr modify qos to modify the limits and,
- sreport to read individual usage.
We have been using this for a while in production without any single issues for 
CPU time allocations.

Now we need to implement GPU time allocation as well for our new GPU partition.
While the 2 first commands work fine to set or change the limits with gres/gpu 
we seem to get values with sreport that do not add up.
In this case we use:

- command='sreport -t minutes -T gres/gpu -nP cluster AccountUtilizationByUser 
start='+date_start+' end='+date_end+' account='+account+' format=login,used'

We have confirmed via the accounting records that the total reported via 
scontrol show assoc_mgr is correct while the value given by sreport is totally 
off.
Did I misunderstand the sreport man page and the command above is reporting 
something else or is this a bug?
We do something similar with "-T cpu", for the CPU part of the code, and the 
number match up. We are using slurm 20.02.0.

Best Regards,

MAO

---
Miguel Afonso Oliveira
Laboratório de Computação Avançada | Laboratory for Advanced Computing
Universidade de Coimbra | University of Coimbra
T: +351239410681
E: miguel.olive...@uc.pt
W: www.uc.pt/lca

Re: [slurm-users] slurm bank and sreport tres minute usage problem

Reply via email to