[slurm-users] Setting up fairshare accounting
Hello, We have a new cluster and I'm trying to setup fairshare accounting. I'm trying to track CPU, MEM and GPU. It seems that billing for individual jobs is correct, but billing isn't being accumulated (TRESRunMin is always 0). In my slurm.conf, I think the relevant lines are AccountingStorageType=accounting_storage/slurmdbd AccountingStorageTRES=gres/gpu PriorityFlags=MAX_TRES PartitionName=gpu Nodes=node[1-7] MaxCPUsPerNode=384 MaxTime=7-0:00:00 State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6" PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6" I currently have one recently finished job and one running job. sacct gives $ sacct --format=JobID,JobName,ReqTRES%50,AllocTRES%50,TRESUsageInAve%50,TRESUsageInMax%50 JobID JobName ReqTRES AllocTRES TRESUsageInAve TRESUsageInMax -- -- -- -- -- 154 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 billing=9,cpu=2,gres/gpu=1,mem=2G,node=1 154.interac+ interacti+ cpu=2,gres/gpu=1,mem=2G,node=1 cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+ cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+ 155 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 billing=9,cpu=2,gres/gpu=1,mem=2G,node=1155.interac+ interacti+ cpu=2,gres/gpu=1,mem=2G,node=1 billing=9 seems correct to me, since I have 1 GPU allocated, which has the largest score of 9.6. However, sshare doesn't show anything in TRESRunMins sshare --format=Account,User,RawShares,FairShare,RawUsage,EffectvUsage,TRESRunMins%110 Account User RawShares FairShare RawUsage EffectvUsage TRESRunMins -- -- -- --- - -- root 21589714 1.00 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 abrol_group 2000 0 0.00 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 luchko_group 2000 21589714 1.00 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 luchko_group tluchko 1 0.33 21589714 1.00 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 Why is TRESRunMin all 0 but RawUsage is not for tluchko? I have checked and slurmdbd is running. Thank you, Tyler Sent with [Proton Mail](https://proton.me/) secure email. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Setting up fairshare accounting
Just following up on my own message in case someone else is trying to figure out RawUsage and Fair Share. I ran some additional tests, except that I ran jobs for 10 min instead of 1 min. The procedure was 1. Set the accounting stats to update every minute in slurm.conf PriorityCalcPeriod=1 2. Reset the RawUsage stat sacctmgr modify account luchko_group set RawUsage=0 3. Check the RawUsage every second while sleep 1; do date; sshare -ao Account,User,RawShares,NormShares,RawUsage ; done > watch.out 4. Run a 10 min job. The billing per CPU is 1, so the total RawUsage should 60,000 and the RawUsage should increase 6,000 each minute sbatch --account=luchko_group --wrap="sleep 600" -p cpu -n 100 Scanning the output file, I can see that the RawUsage does update once every minute. Below are the updates. (I've removed irrelevant output.) Tue Sep 24 10:14:24 AM PDT 2024 Account User RawShares NormShares RawUsage -- -- --- --- luchko_group tluchko 100 0.50 0 Tue Sep 24 10:14:25 AM PDT 2024 luchko_group tluchko 100 0.50 4099 Tue Sep 24 10:15:24 AM PDT 2024 luchko_group tluchko 100 0.50 10099Tue Sep 24 10:16:25 AM PDT 2024 luchko_group tluchko 100 0.50 16099 Tue Sep 24 10:17:24 AM PDT 2024 luchko_group tluchko 100 0.50 22098 Tue Sep 24 10:18:25 AM PDT 2024 luchko_group tluchko 100 0.50 28097 Tue Sep 24 10:19:24 AM PDT 2024 luchko_group tluchko 100 0.50 34096 Tue Sep 24 10:20:25 AM PDT 2024 luchko_group tluchko 100 0.50 40094 Tue Sep 24 10:21:24 AM PDT 2024 luchko_group tluchko 100 0.50 46093 Tue Sep 24 10:22:25 AM PDT 2024 luchko_group tluchko 100 0.50 52091 Tue Sep 24 10:23:24 AM PDT 2024 luchko_group tluchko 100 0.50 58089 Tue Sep 24 10:24:25 AM PDT 2024 luchko_group 2000 0.133324 58087 Tue Sep 24 10:25:25 AM PDT 2024 luchko_group tluchko 100 0.50 58085 So, the RawUsage does increase by the expected amount each minute, and the RawUsage does decay (I have the half-life set to 14 days). However, the update for the last part of a minute, which should be 1901, is not recorded. I suspect this is because the job is no longer running when the accounting update occurs. For typical jobs that run for hours or days, this is a negligible error, but it does explain the results I got when I ran a 1 min job. TRESRunMins is still not updating, but this is an inconvenience. Tyler Sent with [Proton Mail](https://proton.me/mail/home) secure email. On Thursday, September 19th, 2024 at 8:47 PM, tluchko via slurm-users wrote: > Hello, > > I'm hoping someone can offer some suggestions. > > I went ahead started the database from scratch and reinitialized it to see if > that would help and to try and understand how RawUsage is calculated. I ran > two jobs of > > sbatch --account=luchko_group --wrap="sleep 60" -p cpu -n 100 > > With the partition defined as > > PriorityFlags=MAX_TRES > PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 > State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6" > > I expected each job to contribute 6000 to the RawUsage, however one job > contributed 3100 and the other 2800. And TRESRunMins stayed at 0 for all > categories. > > I'm at a loss as to what is going on. > > Thank you, > > Tyler > > Sent with [Proton Mail](https://proton.me/mail/home) secure email. > > On Tuesday, September 10th, 2024 at 9:03 PM, tluchko > wrote: > >> Hello, >> >> We have a new cluster and I'm trying to setup fairshare accounting. I'm >> trying to track CPU, MEM and GPU. It seems that billing for individual jobs >> is correct, but billing isn't being accumulated (TRESRunMin is always 0). >> >> In my slurm.conf, I think the relevant lines are >> >> AccountingStorageType=accounting_storage/slurmdbd >> AccountingStorageTRES=gres/gpu >> PriorityFlags=MAX_TRES >> >> PartitionName=gpu Nodes=node[1-7] MaxCPUsPerNode=384 MaxTime=7-0:00:00 >> State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6" >> PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 >> State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6" >> I currently have one recently finished job and one running job. sacct gives >> >> $ sacct >> --format=JobID,JobName,ReqTRES%50,AllocTRES%50,TRESUsageInAve%50,TRESUsageInMax%50 >> JobID JobName ReqTRES AllocTRES TRESUsageInAve TRESUsageInMax >> -- -- >> -- >> -- >> -- >> 154 interacti+
[slurm-users] Re: Setting up fairshare accounting
Hello, I'm hoping someone can offer some suggestions. I went ahead started the database from scratch and reinitialized it to see if that would help and to try and understand how RawUsage is calculated. I ran two jobs of sbatch --account=luchko_group --wrap="sleep 60" -p cpu -n 100 With the partition defined as PriorityFlags=MAX_TRES PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6" I expected each job to contribute 6000 to the RawUsage, however one job contributed 3100 and the other 2800. And TRESRunMins stayed at 0 for all categories. I'm at a loss as to what is going on. Thank you, Tyler Sent with [Proton Mail](https://proton.me/mail/home) secure email. On Tuesday, September 10th, 2024 at 9:03 PM, tluchko wrote: > Hello, > > We have a new cluster and I'm trying to setup fairshare accounting. I'm > trying to track CPU, MEM and GPU. It seems that billing for individual jobs > is correct, but billing isn't being accumulated (TRESRunMin is always 0). > > In my slurm.conf, I think the relevant lines are > > AccountingStorageType=accounting_storage/slurmdbd > AccountingStorageTRES=gres/gpu > PriorityFlags=MAX_TRES > > PartitionName=gpu Nodes=node[1-7] MaxCPUsPerNode=384 MaxTime=7-0:00:00 > State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6" > PartitionName=cpu Nodes=node[1-7] MaxCPUsPerNode=182 MaxTime=7-0:00:00 > State=UP TRESBillingWeights="CPU=1.0,MEM=0.125G,GRES/gpu=9.6" > I currently have one recently finished job and one running job. sacct gives > > $ sacct > --format=JobID,JobName,ReqTRES%50,AllocTRES%50,TRESUsageInAve%50,TRESUsageInMax%50 > JobID JobName ReqTRES AllocTRES TRESUsageInAve TRESUsageInMax > -- -- > -- > -- > -- > 154 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 > billing=9,cpu=2,gres/gpu=1,mem=2G,node=1 > 154.interac+ interacti+ cpu=2,gres/gpu=1,mem=2G,node=1 > cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+ > cpu=00:00:00,energy=0,fs/disk=2480503,mem=3M,page+ > 155 interacti+ billing=9,cpu=1,gres/gpu=1,mem=1G,node=1 > billing=9,cpu=2,gres/gpu=1,mem=2G,node=1155.interac+ interacti+ > cpu=2,gres/gpu=1,mem=2G,node=1 > > billing=9 seems correct to me, since I have 1 GPU allocated, which has the > largest score of 9.6. However, sshare doesn't show anything in TRESRunMins > > sshare > --format=Account,User,RawShares,FairShare,RawUsage,EffectvUsage,TRESRunMins%110 > Account User RawShares FairShare RawUsage EffectvUsage TRESRunMins > -- -- -- --- > - > -- > root 21589714 1.00 > cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 > abrol_group 2000 0 0.00 > cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 > luchko_group 2000 21589714 1.00 > cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 > luchko_group tluchko 1 0.33 21589714 1.00 > cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0,gres/gpumem=0,gres/gpuutil=0 > > Why is TRESRunMin all 0 but RawUsage is not for tluchko? I have checked and > slurmdbd is running. > > Thank you, > > Tyler > > Sent with [Proton Mail](https://proton.me/) secure email. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com