Re: [slurm-users] Effect of slurmctld and slurmdb going down on running/pending jobs

Tina Friedrich Thu, 24 Jun 2021 02:27:28 -0700

I thought setting partitions to DOWN will kill jobs?

Amjad - to my experience, the slurmdbd & slurmctld server can berebooted with no effect on running jobs. You can't submit whilst it'sdown, and I'm not precisely sure what happens to jobs that are justfinishing - but really the impact should be minimal.

(I've done exactly what you're needing to do - reboot so a change indisk size is picked up - at least once with the cluster running.)

It is absolutely safe to restart slurmctld (and slurmdbd) with jobsrunning on the cluster, that really is something that at least I do allthe time.


Tina

On 24/06/2021 10:16, Josef Dvoracek wrote:

hi,
just set the partitions to "DOWN" to avoid unexpected behavior for usersand reboot slurm(ctl|dbd)+sql box. Running jobs are from my experiencenot affected.
No need to drain nodes.

josef

On 24. 06. 21 0:54, Amjad Syed wrote:
Hello all
We have a cluster running centos 7 . Our slurm scheduler isrunning on a vm machine and we are running out of disk space for /var The slurm innodb is taking most of space. We intend to expand thevdisk for slurm server. This will require a reboot for changes totake effect. Do we have to stop users submitting jobs by drainingall partitions and then restart the server. That is slurmctld.slurmdband mariadb? Or will the restarting of slurm vm have no effect onrunning/pending iobs?
Sincerely

Amjad


--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator

Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] Effect of slurmctld and slurmdb going down on running/pending jobs

Reply via email to