On Thu, 5 May 2022, Legato, John (NIH/NHLBI) [E] wrote:
...
We are in the process of upgrading from Slurm 21.08.6 to Slurm
21.08.8-2. We’ve upgraded the controller and a few partitions worth of
nodes. We notice the nodes are losing contact with the controller but
slurmd is still up. We thought
We upgraded from 21.08.6 to 21.08.8-1 yesterday morning but overnight we
saw the communications issues described by Tim W. We upgraded to
21.08.8-2 this morning and that did the trick to resolve all the
communications problems we were having.
-Paul Edmon-
On 5/6/2022 4:38 AM, Ole Holm Nielse
Hi Juergen,
My upgrade report: We upgraded from 21.08.7 to 21.08.8-1 yesterday for the
entire cluster, and we didn't have any issues. I built RPMs from the
tar-ball and simply did "yum update" on the nodes (one partition at a
time) while the cluster was running in full production mode. All s
Hi John,
this is really bad news. We have stopped our rolling update from Slurm
21.08.6 to Slurm 21.08.8-1 today for exactly that reason: State of
compute nodes already running slurmd 21.08.8-1 suddenly started
flapping between responding and not responding but all other nodes
that were still r