Hi,
Does scontrol ping from the node show the slurm server up? If so munge is
fine. Betting it is not this but it is such an easy check.
Ensure you have the same slurm.conf on master and client.
The fact you can restart the slurmd and all is well is really odd.
Suggests slurm is coming up too so
Hey,
I am running a Slurm cluster that I inherited from an employee who left, so
you will have to forgive any ignorance on my part, I am still coming up to
speed on some core concepts.
I have a vexing issue where one slurm node becomes unresponsive
consistently. Network and DNS seem to be working