Hi Nilesh,
It seems that your Munge setup isn't working. Maybe the munge.key file
isn't shared on all nodes?
I recommend you to take a look at this Wiki page:
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/
to get a complete overview of the tasks involved in setting up a Slurm
cluster.
IHTH,
Ole
On 8/1/25 04:26, Dhumal, Dr. Nilesh via slurm-users wrote:
Hello,
We recently installed slurm-25 on Redhat linux.
We failed to start the slurmctld service.
sudo systemctl start slurmctld
Job for slurmctld.service failed because the control process exited with
error code.
See "systemctl status slurmctld.service" and "journalctl -xeu
slurmctld.service" for details.
sudo systemctl status slurmctld
× slurmctld.service - Slurm controller daemon
Loaded: loaded (/usr/local/lib/systemd/system/slurmctld.service;
enabled; preset: disabled)
Active: failed (Result: exit-code) since Thu 2025-07-31 22:23:18
EDT; 1min 3s ago
Process: 44317 ExecStart=/usr/local/sbin/slurmctld --systemd
$SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
Main PID: 44317 (code=exited, status=1/FAILURE)
CPU: 35ms
Jul 31 22:22:29 fgcu-compute01 systemd[1]: Starting Slurm controller daemon...
Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679]
error: If munged is up, restart with --num-threads=10
Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679]
error: Munge encode failed: Failed to connect to "/run/munge/mung>
Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679]
error: Failed to create MUNGE Credential
Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679]
error: Couldn't load specified plugin name for auth/munge: Plugin>
Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679]
error: cannot create auth context for auth/munge
Jul 31 22:23:18 fgcu-compute01 slurmctld[44317]: [2025-07-31T22:23:18.679]
fatal: failed to initialize auth plugin
Jul 31 22:23:18 fgcu-compute01 systemd[1]: slurmctld.service: Main process
exited, code=exited, status=1/FAILURE
Jul 31 22:23:18 fgcu-compute01 systemd[1]: slurmctld.service: Failed with
result 'exit-code'.
Jul 31 22:23:18 fgcu-compute01 systemd[1]: Failed to start Slurm
controller daemon.
Here is munge service status.
munge.service - MUNGE authentication service
Loaded: loaded (/usr/local/lib/systemd/system/munge.service;
enabled; preset: disabled)
Active: active (running) since Thu 2025-07-31 22:06:14 EDT; 19min ago
Docs: man:munged(8)
Main PID: 44039 (munged)
Tasks: 4 (limit: 606218)
Memory: 1.4M
CPU: 18ms
CGroup: /system.slice/munge.service
└─44039 /usr/local/sbin/munged
Jul 31 22:06:14 fgcu-compute01 systemd[1]: Starting MUNGE authentication
service...
Jul 31 22:06:14 fgcu-compute01 systemd[1]: Started MUNGE authentication
service.
Any suggestion is apprecieted to resolve this issue.
--
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]