Hi all,

I'm relatively new to Slurm and my Internet searches so far have turned up lots of examples from the client perspective, but not from the admin perspective on how to set this up, and I'm hoping someone can point us in the right direction.  This should be pretty simple...  :-)

We have a test cluster running Slurm 21.08.1 and are trying to figure out how to set a limit of 200 CPU cores that can be requested in a partition.  Basically, if someone submits a thousand single CPU core jobs, it should run 200 of them and the other 800 will wait in the queue until 1 is finished, then run their next job from the queue, etc, or if someone has a 180 CPU core job running and they submit a 30 CPU core job, it should wait in the queue until the 180 core job finishes.  If someone submits a job requesting 201 CPU cores, it should fail and give an error.

According to the Slurm resource limits hierarchy, if a partition limit is set, we should be able to setup a user association to override it in the case where we might want someone to be able to access 300 CPU cores in that partition, for example.

I can see in the Slurm documentation how to setup max nodes per partition, but have not been able to find how to do this with CPU cores.

My questions are:

1) How do we setup a CPU core limit on a partition that applies to all users?

2) How do we setup a user association to allow a single person to use more than the default CPU core limit set on the partition?

3) Is there a better way to accomplish this than the method I'm asking?


For reference, Slurm accounting is setup, GPU allocations are working properly, and I think we are close but just missing something obvious to setup the CPU core limits.


Thank you,


-Dj

null

Reply via email to