Hi Sean, Sean Crosby wrote on 16.12.20 10:09:
Check the firewalls between your compute node and the Slurm controller to make sure that they can contact each other. Slurmctld needs to contact the SlurmdPort (default 6818), and slurmd needs to contact the SlurmctldPort (default 6817). Also the other compute nodes need to be able to contact the new compute node on SlurmdPort.
I already did (and opened the firewalls accordingly). In the output of tcpdump I can see that the slurmctld opens a connection to slurmd every 100 seconds, TCP connection is established and then slurmctld sends a data packet of 170 bytes (which probably contains the munge stuff). The connection is afterwards closed by slurmd (sending the FIN packet), the message "invalid credentials" appears in the logs so this looks really like a problem of munge authentication. Thanks for the hints though, firewalling is for sure one of the reasons why I am quite good in using tcpdump... ;-) Olaf -- Dipl. Inform. Olaf Gellert email gell...@dkrz.de Deutsches Klimarechenzentrum GmbH phone +49 (0)40 460094 214 Bundesstrasse 45a fax +49 (0)40 460094 270 D-20146 Hamburg, Germany www http://www.dkrz.de Sitz der Gesellschaft: Hamburg Geschäftsführer: Prof. Dr. Thomas Ludwig Registergericht: Amtsgericht Hamburg, HRB 39784