I have not had a chance to play with the newest Slurm, but I would suggest looking at GrpTRESRaw, which is supposed to gather the usage by TRES (in TRES-minutes). So if there is a billing TRES in GrpTRESRaw, that might be what you want.
On Fri, Apr 27, 2018 at 11:21 AM, Roberts, John E. <jerobe...@anl.gov> wrote: > Hi, > > I'm testing the newest version of Slurm and I'm seeing an issue when using > the newer billing TRES to charge for cpu time on a partition. I've seen > that billing should be used now instead of cpu in order to properly use the > "TRESBillingWeights" option on a partition. > > In my test case, I gave an account 2 hours of billing time. I used 1 hour > of this while setting the partition to TRESBillingWeights="CPU=1.0". It > seemed to have billed properly. > Next, I set on the same partition TRESBillingWeights="CPU=0.5". I ran > several jobs, but the billing never seemed to increase. RawUsage, however, > did increment correctly. > > Here's an examples of sshare reporting no billing run minutes, when > CPU=0.5 and I start a job with a walltime of 1 hour. Even though the > RawUsage is well past 2 hours, a job can still run when it shouldn't. > > # sshare -A test -l -o RawUsage,GrpTRESMins,TRESRunMins%60 > RawUsage GrpTRESMins > TRESRunMins > ----------- ------------------------------ > ----------------------------------------------------- > 11068 billing=120 > cpu=60,mem=0,energy=0,node=60,billing=0 > > If I set CPU=1.0 and start say a job for 2 hours, I get this in the logs: > debug2: Job 32 being held, the job is at or exceeds assoc > 239(test/(null)/(null)) group max tres(billing) minutes of 120 of which 60 > are still available but request is for 120 (plus 0 already in use) tres > minutes (request tres count 1) > > This makes sense because I previously ran a job at the weight of 1.0 for > an hour so it "billed" for 1 hour at that time. How can I query the > "available" billing hours if it's not RawUsage? > > Going back to setting billing CPU weight to 0.5, the logs seem to be > inconsistent too. In this first line, it shows the right thing: > debug: TRES Weight: cpu = 1.000000 * 0.500000 = 0.500000 > > but not a few lines down: > debug2: acct_policy_job_begin: after adding job 45, assoc > 239(test/(null)/(null)) grp_used_tres_run_secs(billing) is 0 > > Again, RawUsage increases correctly, but Slurm is using some other field > for billing to determine if a job can run. > > My questions are: How can I set CPU billing to be less than 1 and how can > I make sure jobs don't run if they are out of time in this case? What is > Slurm using for billing, because it's clearly not RawUsage? Am I simply > misunderstanding the billing and/or weights fields? > > Thanks for any help... > > -- Tom Payerle DIT-ACIGS/Mid-Atlantic Crossroads paye...@umd.edu 5825 University Research Park (301) 405-6135 University of Maryland College Park, MD 20740-3831