Hi, On Tue, Mar 10, 2020 at 05:49:07AM +0000, Rundall, Jacob D wrote: > I need to update the configuration for the nodes in a cluster and I’d like to > let jobs keep running while I do so. Specifically I need to add > RealMemory=<blah> to the node definitions (NodeName=). Is it safe to do this > for nodes where jobs are currently running? Or I need to make sure nodes are > drained while updating their config? We are using SelectType=select/linear on > this cluster. Users would only be allocating complete nodes. > > Additionally, do I need to restart the Slurm daemons (slurmctld and slurmd) > to make this change? I understand if I were adding completely new nodes I > would need to do so (and that it’s advised to stop slurmctld, update config > files, restart slurmd on all computes, and then start slurmctld). But is > restarting the Slurm daemons also required when updating node config as I > would like to do, or would ‘scontrol reconfigure’ suffice?
If you want the change to be persistent, you will need to update the slurm.conf (and/or other files in /etc/slurm). That said, scontrol reconfig should suffice to trigger the change in running slurmd daemons. However, restarting slurmd and slurmctl is no big deal afaik, provided you respect the timeouts that you've set. When restarting slurmd, it will see the running jobs. When restarting slurmctld, it will poll the nodes for info and regain knowledge of running things. So it is no issue to do this live. I would first restart slurmctld and then all the slurmds (after slurmctld is back up and running properly). Regards, -- Andy
signature.asc
Description: PGP signature