Hi, > On 05.02.19 16:46, Ansgar Esztermann-Kirchner wrote: > > [...]-- we'd like to have two "half nodes", where > > jobs will be able to use one of the two GPUs, plus (at most) half of > > the CPUs. With SGE, we've put two queues on the nodes, but this > > effectively prevents certain maintenance jobs from running. > > How would I configure these nodes in Slurm? > > why don't you use an additional "maintenance" queue/partition > containing the whole nodes?
In theory, that's a clean solution, and I'm very willing to test this with Slurm once I've finished the partitioning. With SGE, there are some problems, though. To do this properly, you'd want to have the queues suspend each other (so as not to overcommit resources); but in the past, this has lead SGE to schedule the first slot (master task) of jobs on one of the queues, and the others on a different queue, so jobs would never start because one of the queues would always be suspended. Experience also shows that SGE loses inerest in making reservations quite easily (e.g. when using -l exclusive), and I suspect that a pair of mutually suspending queues might be such a case. > > Let's agree on "other" ;) > use the OS to partition the resources on the host -- VM, systemd-nspawn, > ... . I'll keep that in mind. Thank you very much, A. -- Ansgar Esztermann Sysadmin http://www.mpibpc.mpg.de/grubmueller/esztermann
smime.p7s
Description: S/MIME cryptographic signature