Hi Marcus and Bjørn-Helge Thank you for your answers.
We don’t use slurm billing. We use system acct billing. I also confirm that with --exclusive, there is a difference between ReqCPUS and AllocCPUS, but --mem-per-cpu was more a --mem-per-task than a --mem-per-cpu : it was associated to ReqCPUS. It looks like now it is associated to AllocCPUS. If it’s not a side effect, why do jobs and not rejected instead of accepted and Pending for ever ? The behaviour is the same in 19.05.2 but recorrected in 19.05.3 so the problem seems to be known in v19 but not corrected in v18. Sincerely, Béatrice > Le 12 déc. 2019 à 12:10, Marcus Wagner <wag...@itc.rwth-aachen.de> a écrit : > > Hi Beatrice and Bjørn-Helge, > > I can sign, that it works with 18.08.7. We additionally use > TRESBillingWeights together with PriorityFlags=MAX_TRES. For example: > TRESBillingWeights="CPU=1.0,Mem=0.1875G,gres/gpu=12.0" > We use the billing factor for our external accounting. We do this to do a > fair accounting of the nodes. But we do have a similar effect due to > --exclusive. > In Beatrice case, the billingweight would be: > TRESBillingWeights="CPU=1.0,Mem=0.21875G" > So, a 10 cpu job with 1 GB per cpu would be billed 10. > An 1 cpu job with 10 GB would be billed 2 (0.21875*10, floor). > An exclusive 10 cpu job with 1 GB per cpu would be billed 28 (all 28 cores > are for the job). > An exclusive 1 cpu job with 30GB (Beatrice' example) would be billed > 28(cores)*30(GB)*0.21875 => 118.125 => 118 cores. > > Best > Marcus > > On 12/12/19 9:47 AM, Bjørn-Helge Mevik wrote: >> Beatrice Charton <beatrice.char...@criann.fr> writes: >> >>> Hi, >>> >>> We have a strange behaviour of Slurm after updating from 18.08.7 to >>> 18.08.8, for jobs using --exclusive and --mem-per-cpu. >>> >>> Our nodes have 128GB of memory, 28 cores. >>> $ srun --mem-per-cpu=30000 -n 1 --exclusive hostname >>> => works in 18.08.7 >>> => doesn’t work in 18.08.8 >> I'm actually surprised it _worked_ in 18.08.7. At one time - long before >> v 18.08, the behaviour was changed when using --exclusive: In order to >> account the job for all cpus on the node, the number of >> cpus asked for with --ntasks would simply be multiplied with with >> "#cpus-on-node / --ntasks" (so in your case: 28). Unfortunately, that >> also means that the memory the job requires per node is "#cpus-on-node / >> --ntasks" multiplied with --mem-per-cpu (in your case 28 * 30000 MiB ~= >> 820 GiB). For this reason, we tend to ban --exclusive on our clusters >> (or at least warn about it). >> >> I haven't looked at the code for a long time, so I don't know whether >> this is still the current behaviour, but every time I've tested, I've >> seen the same problem. I believe I've tested on 19.05 (but I might >> remember wrong). >> > > -- > Marcus Wagner, Dipl.-Inf. > > IT Center > Abteilung: Systeme und Betrieb > RWTH Aachen University > Seffenter Weg 23 > 52074 Aachen > Tel: +49 241 80-24383 > Fax: +49 241 80-624383 > wag...@itc.rwth-aachen.de > www.itc.rwth-aachen.de > > -- Béatrice CHARTON | CRIANN beatrice.char...@criann.fr | 745, avenue de l'Université Tel : +33 (0)2 32 91 42 91 | 76800 Saint Etienne du Rouvray --- Support : supp...@criann.fr ---
smime.p7s
Description: S/MIME cryptographic signature