This is related to this other thread:
https://groups.google.com/g/slurm-users/c/88pZ400whu0/m/9FYFqKh6AQAJ
AFAIK, the only rudimentary solution is the MaxCPUsPerNode partition flag,
and setting independent gpu and cpu partitions, but having something like
"CpusReservedPerGpu" would be nice.
@Aaron
I thought of doing this, but, I'm guessing you don't have preemption
enabled.
With preemption enabled this becomes more complicated, and error prone, but
I'll think some more about it. It'd be nice leverage slurm's scheduling
engine and
just add this constraint.
Relu
On 2020-10-20 16:20
I look after a very heterogeneous GPU Slurm setup and some nodes have
quite few cores. We use a job_submit lua script which calculates the
number of requested cpu cores per gpu. This is then used to scan through
a table of 'weak nodes' based on a 'max cores per gpu' property. The
node names are app
Hi all,
We have a GPU cluster and have run into this issue occasionally. Assume
four GPUs per node; when a user requests a GPU on such a node, and all
the cores, or all the RAM, the other three GPUs will be wasted for the
duration of the job, as slurm has no more cores or RAM available to
all