Hi Nilesh,

It seems that your Munge setup isn't working. Maybe the munge.key file isn't shared on all nodes?

I recommend you to take a look at this Wiki page:
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/
to get a complete overview of the tasks involved in setting up a Slurm cluster.

IHTH,
Ole

On 8/1/25 04:26, Dhumal, Dr. Nilesh via slurm-users wrote:
Hello,
We recently installed slurm-25 on Redhat linux.
We failed to start the slurmctld service.
sudo systemctl start slurmctld
Job for slurmctld.service failed because the control process exited with error code. See "systemctl status slurmctld.service" and "journalctl -xeu slurmctld.service" for details.

sudo systemctl status slurmctld
× slurmctld.service - Slurm controller daemon
     Loaded: loaded (/usr/local/lib/systemd/system/slurmctld.service; enabled; preset: disabled)      Active: failed (Result: exit-code) since Thu 2025-07-31 22:23:18 EDT; 1min 3s ago     Process: 44317 ExecStart=/usr/local/sbin/slurmctld --systemd $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
    Main PID: 44317 (code=exited, status=1/FAILURE)
         CPU: 35ms

Jul 31 22:22:29 fgcu-compute01 systemd[1]: Starting Slurm controller daemon...
Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679] error: If munged is up, restart with --num-threads=10 Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679] error: Munge encode failed: Failed to connect to "/run/munge/mung> Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679] error: Failed to create MUNGE Credential Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679] error: Couldn't load specified plugin name for auth/munge: Plugin> Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679] error: cannot create auth context for auth/munge Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679] fatal: failed to initialize auth plugin Jul 31 22:23:18 fgcu-compute01 systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE Jul 31 22:23:18 fgcu-compute01 systemd[1]: slurmctld.service: Failed with result 'exit-code'. Jul 31 22:23:18 fgcu-compute01 systemd[1]: Failed to start Slurm controller daemon.

Here is munge service status.
munge.service - MUNGE authentication service
     Loaded: loaded (/usr/local/lib/systemd/system/munge.service; enabled; preset: disabled)
      Active: active (running) since Thu 2025-07-31 22:06:14 EDT; 19min ago
        Docs: man:munged(8)
    Main PID: 44039 (munged)
       Tasks: 4 (limit: 606218)
      Memory: 1.4M
         CPU: 18ms
      CGroup: /system.slice/munge.service
              └─44039 /usr/local/sbin/munged

Jul 31 22:06:14 fgcu-compute01 systemd[1]: Starting MUNGE authentication service... Jul 31 22:06:14 fgcu-compute01 systemd[1]: Started MUNGE authentication service.

Any suggestion is apprecieted to resolve this issue.

--
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to