Hello,

I need your help. I have a user who is able to bypass his account Limit, and I can't figure out why.

We have two partitions with lower tier (all hosts) named "long", and higher tier (only one host) named "sfglab". Here is it's config:

PartitionName=long Nodes=dgx-[1-4],sr-[1-3] MaxTime=10-0 State=UP PriorityTier=10000 QOS=long OverSubscribe=NO DefaultTime=2-00:00:00 PreemptMode=suspend TRESBillingWeights=CPU=1,Mem=0.062G,GRES/gpu=72.458

PartitionName=sfglab Nodes=dgx-1 MaxTime=10-0 State=UP PriorityTier=20000 PreemptMode=off OverSubscribe=NO AllowAccounts=sfglab AllowGroups=sfglab TRESBillingWeights=CPU=1,Mem=0.062G,GRES/gpu=72.458 Hidden=NO

Partition long also has assigned QOS (named "long") with limit: cpu=450,gres/gpu=28,mem=6T

And also we have account (also named "sfglab") with limits:

sacctmgr modify account where name=sfglab set GrpTRES=cpu=300,gres/gpu=20,mem=5T


As a result user is able to allocate more 32 GPUs...
I don't understand why :(

We use slurm 23.02.2.

--
best regards | pozdrawiam serdecznie
*Michał Kadlof*
Head of the high performance computing center Kierownik ośrodka obliczeniowego HPC
Eden^N cluster administrator    Administrator klastra obliczeniowego Eden^N
Structural and Functional Genomics Laboratory Laboratorium Genomiki Strukturalnej i Funkcjonalnej Faculty of Mathematics and Computer Science Wydział Matematyki i Nauk Informacyjnych
Warsaw University of Technology         Politechnika Warszawska

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to