Hoping someone will get eyes on this one. I ended up changing the partition in question to only use 1 thread per core to keep things simple, but it would still be nice to know why slurm is looking at TRES hours instead of RawUsage.
thanks. -John On Wed, Nov 15, 2017 at 10:55 AM, John Roberts <roberts.johne...@gmail.com> wrote: > Hi, > > I'm having an issue with accounts in slurm and not sure if I'm missing > something. Here's a quick breakdown of the issue: > > We have many accounts in Slurm (v16.05.10) / SlurmDBD. We recently set 1 > partition's billing weight to 0.25. This partition has 64 cores with 4 > threads per node. We set this weight to 0.25 so we don't bill for threads, > just core hours. This part seems to be working ok. > > When querying the account balance via RawUsage (and we use sbank to > present this to the user in readable hours), these numbers look right. They > come out to a quarter of full node. > > However, when querying say "UserUtilizationByAccount", this number is > about 4 times as much. This also makes sense because they are technically > being allocated for all cores and threads, but we only expect to bill for a > quarter of the time. > > The problem arose when a user of this account tried to submit a job and it > sat in the queue with the error "AssocGrpCPUMinutesLimit". > > Turning up the debug logs showed this: > > "debug2: Job 161868 being held, the job is at or exceeds assoc > 2159(<foo>/(null)/(null)) group max tres(cpu) minutes of 150000000 of which > 27718972 are still available but request is for 94371840 (plus 0 already in > use) tres minutes (request tres count 65536)" > > The available number above "27718972" matches what the balance would have > been from the max CPU minutes minus the usage from > "UserUtilizationByAccount" instead of reporting the real balance of 4x that > number. > > Why would Slurm be trying to schedule jobs based on this number instead of > RawUsage? If we're billing it lower, RawUsage should be the true balance, > but that doesn't seem to be the case. > > thanks! > -John >