Hi Olaf,

Since you are testing Slurm, perhape my Slurm Wiki page may be of interest to you:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation

There is a discussion about the setup of Munge.

Best regards,
Ole

On 12/15/20 5:48 PM, Olaf Gellert wrote:
Hi all,

we are setting up a new test cluster to test some features for our
next HPC system. On one of the compute nodes we get these messages
in the log:

[2020-12-15T10:00:21.753] error: Munge decode failed: Invalid credential
[2020-12-15T10:00:21.753] auth/munge: _print_cred: ENCODED: Thu Jan 01 01:00:00 1970 [2020-12-15T10:00:21.753] auth/munge: _print_cred: DECODED: Thu Jan 01 01:00:00 1970 [2020-12-15T10:00:21.753] error: slurm_receive_msg_and_forward: g_slurm_auth_verify: REQUEST_NODE_REGISTRATION_STATUS has authentication error: Invalid authentication credential [2020-12-15T10:00:21.753] error: slurm_receive_msg_and_forward: Protocol authentication error [2020-12-15T10:00:21.763] error: service_connection: slurm_receive_msg: Protocol authentication error

I checked munge authentication in the usual way, so:
- time between nodes is synchronised
- munge is using same UID/GID on both sides
- "munge -c0 -z0 -n | unmunge" works on compute nodes and on slurmctld
   node
- ssh slurmcontrolnode "munge -c0 -z0 -n" | unmunge on a compute node
   works
- ssh computenode "munge -c0 -z0 -n" | unmunge on the slurmctld node
   works

So munge seems to work as far as I can say. What else does
slurm using munge? Are hostnames part of the authentication?
Do I have to wonder about the time "Thu Jan 01 01:00:00 1970"
(in the logs above)?

All machines are CentOS8, slurm is self-built 20.11.0,
munge is from CentOS8 rpm:

munge-0.5.13-1.el8.x86_64
munge-libs-0.5.13-1.el8.x86_64

Cheers, Olaf



Reply via email to