I thought setting partitions to DOWN will kill jobs?
Amjad - to my experience, the slurmdbd & slurmctld server can be
rebooted with no effect on running jobs. You can't submit whilst it's
down, and I'm not precisely sure what happens to jobs that are just
finishing - but really the impact should be minimal.
(I've done exactly what you're needing to do - reboot so a change in
disk size is picked up - at least once with the cluster running.)
It is absolutely safe to restart slurmctld (and slurmdbd) with jobs
running on the cluster, that really is something that at least I do all
the time.
Tina
On 24/06/2021 10:16, Josef Dvoracek wrote:
hi,
just set the partitions to "DOWN" to avoid unexpected behavior for users
and reboot slurm(ctl|dbd)+sql box. Running jobs are from my experience
not affected.
No need to drain nodes.
josef
On 24. 06. 21 0:54, Amjad Syed wrote:
Hello all
We have a cluster running centos 7 . Our slurm scheduler is
running on a vm machine and we are running out of disk space for /var
The slurm innodb is taking most of space. We intend to expand the
vdisk for slurm server. This will require a reboot for changes to
take effect. Do we have to stop users submitting jobs by draining
all partitions and then restart the server. That is slurmctld.slurmdb
and mariadb? Or will the restarting of slurm vm have no effect on
running/pending iobs?
Sincerely
Amjad
--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk