Re: [slurm-users] Slurm 21.08.8-2 upgrade

2022-05-09 Thread Mark Dixon
On Thu, 5 May 2022, Legato, John (NIH/NHLBI) [E] wrote: ... We are in the process of upgrading from Slurm 21.08.6 to Slurm 21.08.8-2. We’ve upgraded the controller and a few partitions worth of nodes. We notice the nodes are losing contact with the controller but slurmd is still up. We thought

Re: [slurm-users] Slurm 21.08.8-2 upgrade

2022-05-06 Thread Paul Edmon
We upgraded from 21.08.6 to 21.08.8-1 yesterday morning but overnight we saw the communications issues described by Tim W.  We upgraded to 21.08.8-2 this morning and that did the trick to resolve all the communications problems we were having. -Paul Edmon- On 5/6/2022 4:38 AM, Ole Holm Nielse

Re: [slurm-users] Slurm 21.08.8-2 upgrade

2022-05-06 Thread Ole Holm Nielsen
Hi Juergen, My upgrade report: We upgraded from 21.08.7 to 21.08.8-1 yesterday for the entire cluster, and we didn't have any issues. I built RPMs from the tar-ball and simply did "yum update" on the nodes (one partition at a time) while the cluster was running in full production mode. All s

Re: [slurm-users] Slurm 21.08.8-2 upgrade

2022-05-05 Thread Juergen Salk
Hi John, this is really bad news. We have stopped our rolling update from Slurm 21.08.6 to Slurm 21.08.8-1 today for exactly that reason: State of compute nodes already running slurmd 21.08.8-1 suddenly started flapping between responding and not responding but all other nodes that were still r