Hi everyone,

I’m adding a new node to an existing cluster. After installing slurm and the 
prereqs, I synced the clocks with ntpd. When I run ‘ntpq -p’, I get 0.0 for 
delay, offset and jitter. (the slurm head node is also the ntp server) ‘date’ 
also gives me identical times for the head and compute nodes. However, when I 
start slurmd, I get a munge error about the clocks being out of sync. From the 
slurmctld log:

[2020-10-27T11:02:06.511] node NEW_NODE returned to service
[2020-10-27T11:02:07.265] error: Munge decode failed: Rewound credential
[2020-10-27T11:02:07.265] ENCODED: Tue Oct 27 11:09:45 2020
[2020-10-27T11:02:07.265] DECODED: Tue Oct 27 11:02:07 2020
[2020-10-27T11:02:07.265] error: Check for out of sync clocks
[2020-10-27T11:02:07.265] error: slurm_unpack_received_msg: 
MESSAGE_NODE_REGISTRATION_STATUS has authentication error: Rewound credential
[2020-10-27T11:02:07.265] error: slurm_unpack_received_msg: Protocol 
authentication error
[2020-10-27T11:02:07.275] error: slurm_receive_msg [HEAD_NODE_IP:PORT]: 
Unspecified error

I restarted ntp, munge and the slurm daemons on both nodes before this last 
error was generated. Any idea what’s going on here?

Thanks,
Gard
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

Reply via email to