Re: [slurm-users] Accounting configuration

Skouson, Gary Wed, 16 Jan 2019 08:16:29 -0800

That’s kind of what I’m looking for, but I’d like to modify the partition limit 
for an account rather than for a user.  Something like:


sacctmgr modify account name=gbstest partition=batch  grpjobs=1

Using sacctmgr to add a partition for a user works fine, unfortunately, 
partition isn’t one of the options for modifying an account.

Any idea for limiting at the account+partition level rather than 
account+user+partition?

Setting things for users seems to work as expected, unless I submit a job with 
multiple partitions.

I have two partitions batch and burst.  I set a  limit of  grpjobs=1 for batch. 
 I submit jobs to partition “batch,burst” and it starts more than 1 jobs in the 
batch partition.  I thought the others would need to go into the “burst” 
partition.

Here’s an example of what I’m seeing.

[gbs35@sltest ~]$ grep PartitionName=b /etc/slurm/slurm.conf
PartitionName=batch Nodes=ALL Default=no MaxTime=INFINITE State=UP CpuBind=core 
OverSubscribe=no DenyAccounts=open PriorityTier=100
PartitionName=burst Nodes=ALL Default=no MaxTime=INFINITE State=UP CpuBind=core 
OverSubscribe=no DenyAccounts=open PriorityTier=50
[gbs35@sltest ~]$ sacctmgr show account name=gbstest withass 
format=account,cluster,partition,user,grpcpus,grpjobs
   Account    Cluster  Partition       User  GrpCPUs GrpJobs
---------- ---------- ---------- ---------- -------- -------
   gbstest     sltest
   gbstest     sltest      burst      gbs35
   gbstest     sltest      batch      gbs35                1
[gbs35@sltest ~]$ squeue -a
       JOBID       USER       PARTITION NODES  CPUS ST  TIME_LEFT START_TIME    
      NODELIST(R
          91      gbs35     batch,burst     1     1 PD      30:00 
2019-01-16T11:19:00 (Nodes req
          90      gbs35     batch,burst     1     1 PD      30:00 
2019-01-16T10:49:00 (Nodes req
          89      gbs35     batch,burst     1     1 PD      30:00 
2019-01-16T10:19:22 (Nodes req
          88      gbs35           batch     1     1  R      27:37 
2019-01-16T09:56:36 sltest
          83      gbs35           batch     1     1  R      20:23 
2019-01-16T09:49:22 sltest
          84      gbs35           batch     1     1  R      20:23 
2019-01-16T09:49:22 sltest
          85      gbs35           batch     1     1  R      20:23 
2019-01-16T09:49:22 sltest
          86      gbs35           batch     1     1  R      20:23 
2019-01-16T09:49:22 sltest
          87      gbs35           batch     1     1  R      20:23 
2019-01-16T09:49:22 sltest
          81      gbs35           batch     1     1  R      20:20 
2019-01-16T09:49:19 sltest
          82      gbs35           batch     1     1  R      20:20 
2019-01-16T09:49:19 sltest



Taking a look at another job, it appears that the “limit” info is getting added 
to the wrong partition for the association for this job.  From slurmctld.log I 
see:

[2019-01-16T10:41:22.883] debug:  sched: Running job scheduler
[2019-01-16T10:41:22.883] debug2: found 1 usable nodes from config containing 
sltest
[2019-01-16T10:41:22.883] debug3: _pick_best_nodes: JobId=184 idle_nodes 1 
share_nodes 1
[2019-01-16T10:41:22.883] debug2: select_p_job_test for JobId=184
[2019-01-16T10:41:22.883] debug5: powercapping: checking JobId=184 : skipped, 
capping disabled
[2019-01-16T10:41:22.883] debug3: select/cons_res: _add_job_to_res: JobId=184 
act 0
[2019-01-16T10:41:22.883] debug3: select/cons_res: adding JobId=184 to part 
batch row 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, qos normal grp_used_tres_run_secs(cpu) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, qos normal grp_used_tres_run_secs(mem) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, qos normal grp_used_tres_run_secs(node) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, qos normal grp_used_tres_run_secs(billing) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, qos normal grp_used_tres_run_secs(fs/disk) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, qos normal grp_used_tres_run_secs(vmem) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, qos normal grp_used_tres_run_secs(pages) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(cpu) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(mem) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(node) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(billing) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(fs/disk) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(vmem) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 30(gbstest/gbs35/burst) grp_used_tres_run_secs(pages) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(cpu) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(mem) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(node) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(billing) is 
1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(fs/disk) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(vmem) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 27(gbstest/(null)/(null)) grp_used_tres_run_secs(pages) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(cpu) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(mem) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(node) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(billing) is 1800
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(fs/disk) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(vmem) is 0
[2019-01-16T10:41:22.884] debug2: acct_policy_job_begin: after adding 
JobId=184, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(pages) is 0
[2019-01-16T10:41:22.884] debug3: sched: JobId=184 initiated
[2019-01-16T10:41:22.884] sched: Allocate JobId=184 NodeList=sltest #CPUs=1 
Partition=batch

You can see that the job is in Partition=batch, but the “acct_policy_job_begin” 
stuff has the association of (gbstest/gbs35/burst) I would have thought 
(gbstest/gbs35/batch) would make more sense.  Somewhere the pointer to the 
correct pointer isn’t making it through.

-----
Gary Skouson


From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Thomas 
M. Payerle
Sent: Tuesday, January 15, 2019 12:57 PM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Accounting configuration

Generally, the add, modify, etc sacctmgr
commands want an "user" or "account" entity, but can modify associations though 
this.

E.g., if user baduser should have GrpTRESmin of cpu=1000 set on partition 
special, use something like
sacctmgr add user name=baduser partition=special account=testacct 
grptresmin=cpu=1000
if there is no association for that user, account and partition already, or
sacctmgr modify user where user=baduser partition=special  set 
grptresmin=cpu=1000

To place the restriction on an account instead, add/modify the account with a 
partition field.



On Tue, Jan 15, 2019 at 11:33 AM Skouson, Gary 
<gb...@psu.edu<mailto:gb...@psu.edu>> wrote:
Slurm accounting info is stored based on user, cluster, partition and account.  
I'd like to be able to enforce limits for an account based on the partition 
it's running in.

Sadly, I'm not seeing how to use sacctmgr to change the partition as part of 
the association.  The add, modify and delete seem to only apply to user, 
account and cluster entities.  How do I add a partition to a particular account 
association, and set GrpTRES for an association that includes a partition.

I know I can change the partition configuration in slurm.conf and use 
AllowAccounts, but that doesn't change the usage limits on a partition for a 
particular account.

Maybe there's another way to work around this that I'm missing.

I'd like to be able to use GrpTRESMins to limit overall cumulative account 
usage. I also want to limit accounts to differing resources (GrpTRES) on some 
partitions (for preemption/priority etc.)

Thoughts?

-----
Gary Skouson





--
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads        paye...@umd.edu<mailto:paye...@umd.edu>
5825 University Research Park               (301) 405-6135
University of Maryland
College Park, MD 20740-3831

Re: [slurm-users] Accounting configuration

Reply via email to