Dear Brian,
Thanks for the hints, I think you are correctly pointing at some network
connection issue. I've disabled firewalld on the control host, but that
unfortunately did not help. The processes stuck in CLOSE-WAIT suggest
indeed that network connections are not properly terminated.
I've tried
Do you have a firewall between the slurmd and the slurmctld daemons? If yes,
do you know what kind of idle timeout that firewall has for expiring idle
sessions? I ran into something somewhat similar but for me it was between the
slurmctld and slurmdbd where a recent change they made had one di