[slurm-users] Using cgroups to hide GPUs on a shared controller/node

2019-05-17 Thread Dave Evans
We are using a single system "cluster" and want some control of fair-use with the GPUs. The sers are not supposed to be able to use the GPUs until they have allocated the resources through slurm. We have no head node, so slurmctld, slurmdbd, and slurmd are all run on the same system. I have a conf

Re: [slurm-users] Issue with x11

2019-05-17 Thread Alan Orth
Dear Christopher, I tried as you suggested and increased UnkillableStepTimeout from 60 to 120 seconds, but a few hours later three of my nodes were drained with reason "Kill task failed" again. We're not using cgroups. There is a bug¹ on SchedMD's tracker describing attempts to understand this err

Re: [slurm-users] Nodes show by sinfo in partitions

2019-05-17 Thread Martijn Kruiten
Hi Fabio, My guess is that you can (partly) solve this by using the correct state in slurm.conf. Either CLOUD or FUTURE might be what you're looking for. See `man slum.conf`. Kind regards, Martijn Kruiten On Fri, 2019-05-17 at 09:17 +, Verzelloni Fabio wrote: > Hello, > I have a question r

[slurm-users] Nodes show by sinfo in partitions

2019-05-17 Thread Verzelloni Fabio
Hello, I have a question related to the cloud feature or a feature that can solve an issue that I have with my cluster,to make it simple let say that I have a set of nodes ( let say 10 nodes ), if needed I move node/s from cluster A to cluster B and in my slurm.conf I define all the possible num