Re: [slurm-users] How to partition nodes into smaller units

Ansgar Esztermann-Kirchner Mon, 11 Feb 2019 00:13:13 -0800

Hi,

> On 05.02.19 16:46, Ansgar Esztermann-Kirchner wrote:
> > [...]-- we'd like to have two "half nodes", where
> > jobs will be able to use one of the two GPUs, plus (at most) half of
> > the CPUs. With SGE, we've put two queues on the nodes, but this
> > effectively prevents certain maintenance jobs from running.
> > How would I configure these nodes in Slurm?
> 
> why don't you use an additional "maintenance" queue/partition
> containing the whole nodes?


In theory, that's a clean solution, and I'm very willing to test this
with Slurm once I've finished the partitioning.
With SGE, there are some problems, though. To do this properly, you'd
want to have the queues suspend each other (so as not to overcommit
resources); but in the past, this has lead SGE to schedule the first
slot (master task) of jobs on one of the queues, and the others on a
different queue, so jobs would never start because one of the queues
would always be suspended.
Experience also shows that SGE loses inerest in making reservations
quite easily (e.g. when using -l exclusive), and I suspect that a pair
of mutually suspending queues might be such a case.
> 
> Let's agree on "other" ;)
> use the OS to partition the resources on the host -- VM, systemd-nspawn,
> ... .

I'll keep that in mind.

Thank you very much,

A.

-- 
Ansgar Esztermann
Sysadmin
http://www.mpibpc.mpg.de/grubmueller/esztermann

smime.p7s
Description: S/MIME cryptographic signature

Re: [slurm-users] How to partition nodes into smaller units

Reply via email to