Hello everyone,
 
I have a Slurm GPU cluster that I'm administrating and from time to time I need to run test jobs. The issue is that my users allocate all GPUs as soon as they become available, which makes testing for me impossible.
 
I could drain a node and wait until all jobs are finished, but as soon as I enable it again to run my test job the queued user jobs will be scheduled for that node. Is there a way to drain a node and still be able to schedule jobs for select users (e.g. root)? I currently don't have any kind of priority system in my cluster.
 
Thank you in advance and best regards

Reply via email to