Dear Slurm users,

I'm looking to fulfill the requirement of guaranteeing availability of GPU resources to a Slurm account, while allowing this account to use other available GPU resources as well.

The guaranteed GPU resources should be of at least 1 type, optionally up to 3 types, as in:
Gres=gpu:type_1:N,gpu:type_2:P,gpu:type_3:Q

The version of Slurm I'm using is 20.11.9.


Ideas I came up with so far:

Placing a reservation seems like the simplest solution. But this forces users of the account to decide whether to submit their jobs within the reservation or outside, based on a manual check of currently available GPU resources in the cluster.

Changing the partition setup by moving nodes into a new partition for exclusive use of the account is an overhead I'd like to avoid, as this is a time-limited scenario. Even though this looks like a working solution when combined with an extension to the job_submit.lua prioritizing partitions for users of said account.


I haven't looked at QOS, yet, hoping for a short-cut from anyone who already has a working solution to my problem.

If you have such a solution, would you mind sharing it?

Thanks,
Stephan

Reply via email to