subject:"Re\: \[slurm\-users\] Limit concurrent gpu resources"

Re: [slurm-users] Limit concurrent gpu resources

2019-04-24 Thread Prentice Bisbal

Here's how we handle this here: Create a separate partition named debug that also contains that node. Give the debug partition a very short timelimit, say 30 - 60 minutes. Long enough for debugging, but too short to do any real work. Make the priority of the debug partition much higher than t

Re: [slurm-users] Limit concurrent gpu resources

2019-04-24 Thread Renfro, Michael

We put a ‘gpu’ QOS on all our GPU partitions, and limit jobs per user to 8 (our GPU capacity) via MaxJobsPerUser. Extra jobs get blocked, allowing other users to queue jobs ahead of the extras. # sacctmgr show qos gpu format=name,maxjobspu Name MaxJobsPU -- - gpu