Short answer yes Its not risk free but as long as you increase all the timeouts to your worst case estimate x4 and make sure you understand the upgrades section of this link https://slurm.schedmd.com/quickstart_admin.html
And keep it open for reference you should be fine Antony On Wed, 26 May 2021, 19:25 Will Dennis, <wden...@nec-labs.com> wrote: > Hi all, > > > > About to embark on my first Slurm upgrade (building from source now, into > a versioned path /opt/slurm/<vernum>/ which is then symlinked to > /opt/slurm/current/ for the “in-use” one…) This is a new cluster, running > 20.11.5 (which we now know has a CVE that was fixed in 20.11.7) but I have > researchers running jobs on it currently. As I’m still building out the > cluster, I found today that all Slurm source tarballs before 20.11.7 were > withdrawn by SchedMD. So, need to upgrade at least the -ctld and -dbd nodes > before I can roll any new nodes out on 20.11.7… > > > > As I have at least one researcher that is running some long multi-day > jobs, can I down the -dbd and -ctld nodes and upgrade them, then put them > back online running the new (latest) release, without munging the jobs on > the running worker nodes? > > > > Thanks! > > Will >