Re: [slurm-users] How to avoid a feature?

Tina Friedrich Fri, 02 Jul 2021 04:44:14 -0700

Hi Loris,

we didn't want to have too many partitions, mainly; so we were after away to have the GPU nodes not separated out.

Partly it is because we wanted to be able to easily use 'idle' CPUs onGPU nodes - although I currently only allow that on some of them (Isimply also tag them with 'cpu'). Having them in a separate partitionwould mean users would have to change where they submit to, or I wouldhave to mess with that in the verifier...

Also - for various reasons, we'd end up with a lot of partitions(something like 10 or 12) - that seemed a lot of partitions. We liked itbetter having the GPU nodes not separated out & teach users to specifytheir resources properly (the GPUs are a very mixed bunch, as well.)

We did think about having 'hidden' GPU partitions instead of wranglingit with features, but there didn't seem to be any benefit to that thatwe could see.


Tina

On 02/07/2021 06:48, Loris Bennett wrote:

Hi Tina,

Tina Friedrich <tina.friedr...@it.ox.ac.uk> writes:

Hi Brian,

sometimes it would be nice if SLURM had what Grid Engine calls a 'forced
complex' (i.e. a feature that you *have* to request to land on a node that has
it), wouldn't it?

I do something like that for all of my 'special' nodes (GPU, KNL, nodes...) - I
want to avoid jobs not requesting that resource or allowing that architecture
landing on it. I 'tag' all nodes with a relevant feature (cpu, gpu, knl, ...),
and have a LUA submit verifier that checks for a 'relevant' feature (or a
--gres=gpu or somthing) and if there isn't one I add the 'cpu' feature to the
request.

Works for us!


We just have the GPU nodes in a separate partition 'gpu' which users
have to specify if they want a GPU.  How does that approach differ from
yours in terms of functionality for you (or the users)?

The main problem with our approach is that the CPUs on the GPU nodes can
remain idle while there is a queue for the regular CPU nodes.  What I
would like is to allow short CPU-only jobs to run on the GPUs but only
allow GPU-jobs to run for longer, which I guess I could probably do
within the submit plugin.

Cheers,

Loris

Tina

On 01/07/2021 15:08, Brian Andrus wrote:

All,

I have a partition where one of the nodes has a node-locked license.
That license is not used by everyone that uses the partition.
They are cloud nodes, so weights do not work (there is an open bug about
that).

I need to have jobs 'avoid' that node by default. I am thinking I can use a
feature constraint, but that seems to only apply to those that want the
feature. Since we have so many other users, it isn't feasible to have them
modify their scripts, so having it avoid by default would work.

Any ideas how to do that? Submit LUA perhaps?

Brian Andrus


--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator

Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] How to avoid a feature?

Reply via email to