Hi Xaver,

We have been running Configless Slurm for a number of years, and we're very happy with this setup. I have documented all the detailed configurations we made in this Wiki page, so maybe you want to consult this page:

https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#configless-slurm-setup

IHTH,
Ole

On 1/16/26 11:11, Xaver Stiensmeier via slurm-users wrote:
Hey everyone,

in the past we set up clusters with configs on each node. Now we want to explore configless. Without changing anything else, we therefore followed: https://slurm.schedmd.com/configless_slurm.html and added 'enable_configless' in the config on the master:

    
SlurmctldParameters=cloud_dns,idle_on_node_suspend,enable_configless,reconfig_on_restart

and start each worker's slurmd with the conf-server parameter:

    # Override systemd service to set conditional path
    [Service]
    ExecStart=
    ExecStart=/usr/sbin/slurmd --conf-server=master

However, this leads to:

    slurmd: error: _fetch_child: failed to fetch remote configs: Protocol
    authentication error

    slurmd: error: _establish_configuration: failed to load configs.
    Retrying in 10 seconds.

on the workers and on the master (/var/log/slurm/slurmctld) to:

    [2026-01-16T10:00:06.681] error: Munge decode failed: Invalid credential
    [2026-01-16T10:00:06.681] auth/munge: _print_cred: ENCODED: Thu Jan 01
    00:00:00 1970
    [2026-01-16T10:00:06.681] auth/munge: _print_cred: DECODED: Thu Jan 01
    00:00:00 1970
    [2026-01-16T10:00:06.681] error: slurm_unpack_received_msg:
    [[worker]:24295] auth_g_verify: REQUEST_CONFIG has authentication
    error: Unspecified error
    [2026-01-16T10:00:06.681] error: slurm_unpack_received_msg:
    [[worker]:24295] Protocol authentication error

The munge key setup is the same as before so I don't think there is anything wrong with it unless something changes with configless (slurm.conf):

    AuthType=auth/munge
    CryptoType=crypto/munge
    AuthAltTypes=auth/jwt
    AuthAltParameters=jwt_key=/etc/slurm/jwt-secret.key

I found https://groups.google.com/g/slurm-users/c/Q7FVkhx-bOs but this seems unrelated as both can talk fine with each other:

    worker:~$ nc -zv master 6817
    Connection to master (192.168.20.169) 6817 port [tcp/*] succeeded!

I tried adding more "-v" to the slurmd start, but that did not give more information. I am unsure how to debug this further. Somehow I think it must be a munge issue, but I am confused as this part hasn't changed.

Best regards,
Xaver

--
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to