Hi,
We would like to do over-subscription on a cluster that's running in the
cloud. The cluster dynamically spins up and down cpu nodes as needed.
What we see is that the least-loaded algorithm causes the maximum number
of nodes specified in the partition to be spun up and each loaded with N
We are pleased to announce the availability of Slurm version 21.08.6.
This includes a number of fixes since the last maintenance release was
made in December, including an import fix to a regression seen when
using the 'mpirun' command within a job script.
Slurm can be downloaded from https:/
Well; couldn't you either
1) salloc the lot on a maintenance day (bit manual) or
2) make your SuspendProgram check for currently active (maintenance)
reservations before shutting down a node (or some other flag)
Also, if you mod slurm.conf in preparation for maintenance days & just
exclude al
Hi Tina,
Thanks, its not so much the config being fully up to date on compute nodes, its
more, when we transition into the system wide maintenance day reservation, I
anticipate some of the compute nodes will be down due to power saving (I'm
expecting the reservation not to impact that, no job w
Hi everybody,
for forcing a run of your config management as Tina suggested you might
just add a
ExecStartPre=
line to your slurmd.service file?
This is somewhat unrelated to your problem but we are very successfully
using
ExecStartPre=-/usr/bin/nvidia-smi -L
in our slurmd.service file t
Hi David,
it's also not actually a problem if the slurm.conf is not exactly the
same immediately on boot - really. Unless there's changes that are very
fundamental, nothing bad will happen if they pick up a new copy after,
say, 5 or 10 minutes.
But it should be possible to - for example - fo
Hi Brian,
>>For monitoring, I use a combination of netdata+prometheus. Data is gathered
>>whenever the nodes are up and stored for history. Yes, when the nodes are
>>powered down, there are empty gaps, but that is interpreted as the node is
>>powered down.
Ah time-series will cope much better