[slurm-users] Re: Limit number of allocated GPUs

Smith, Sebastian via slurm-users Thu, 23 Oct 2025 11:54:26 -0700

This is an interesting question, and I was thinking the same as Brian.

For sake of discussion, I’m not sure MaxTRESPerNodewill achieve the desired 
job distribution because the limits are applied per job not across all a user's 
jobs.  But… I’ve never used this limit, and I may be interpreting the docs 
incorrectly.  Definitely worth testing.


Combining with SelectTypeParameters=CR_LLN would help distribute the 
workloads across least loaded nodes but also wouldn’t guarantee one job per 
user per node.

This could be achieved by the user structuring their jobs and selecting the 
right combo of sbatch/srun options.

I’m not sure if there’s a baked-in set of options that will achieve all 
requirements.  It might require a custom select plugin?!

I don’t know if any of this moves the needle...  Good question!  Excited to 
learn more, and if a solution exists.  Don’t forget to share what you find.

Thanks,


Sebastian Smith

Seattle Children’s Hospital

DevOps Engineer, Principal

Email: 
[email protected]<mailto:[email protected]?subject=[SIG]>

Web: https://seattlechildrens.org<https://seattlechildrens.org/>



--



From: Brian Andrus via slurm-users <[email protected]>
Date: Thursday, October 23, 2025 at 10:03
To: [email protected] <[email protected]>
Subject: [slurm-users] Re: Limit number of allocated GPUs

This Message Is From an External Sender
Report 
Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/NuzbfyPwt6ZyPHQ!ghziwncPB1tjMj_c4duHp_fnF01uSPoHrhZ9Ewe_QFYKKEO6AxKMsiKVCEoVyAKVclINH2h5n7thZ0uMLUJDv3BjMBm4TCJnHD6hXbyAopEg$>


You may want to look at MaxTRESPerNode and possibly MaxTRESPerJob. Doing it 
PerUser means all running jobs for that user, which may not be what you want.

Brian Andrus

On 10/22/2025 11:44 PM, Gestió Servidors via slurm-users wrote:
Hello,

I have three nodes, serving each one 2 GPUs. I would like to limit (qos??) that 
a user could user only one GPU from earch server, but user could user 
simultaneously three GPUs if each GPU belongs to different servers. With this 
QoS “sacctmgr add qos test-limit-GPUs MaxJobsPerUser=3 
MaxTRESPerUser=gres/gpu=1” I can limit to one GPU, but then user can’t run 
other job in a GPU from other server. How must I configure QoS (or other 
method) to allow more than one job requesting GPUs but never in the same server?

Thanks.



CONFIDENTIALITY NOTICE: This e-mail, including any attachments, is for the sole 
use of the intended recipient(s) and may contain confidential and privileged 
information protected by law. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply e-mail and destroy all copies of the original 
message.

-- 
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[slurm-users] Re: Limit number of allocated GPUs

Reply via email to