Greetings,

Reminder: i am new to SLURM.

When i execute  “sinfo” my nodes are down.

sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      4  down* radonc[01-04]

This is what i have done so far and nothing has helped. The nodes are in “idle” 
state for 2-3 minutes and then there are “down” again.

systemctl restart slurmd    on all nodes

systemctl restart slurmctld  on master

scontrol update node=radonc[01-04] state=UNDRAIN

scontrol update node=radonc[01-04] state=IDLE



I looked at the log file in /var/log/SlurmdLogFile.log  and saw some “munge 
decode failed: Invalid credential”

[2018-05-07T12:37:20.028] error: slurm_unpack_received_msg: 
MESSAGE_NODE_REGISTRATION_STATUS has authentication error: Invalid credential
[2018-05-07T12:37:20.028] error: slurm_unpack_received_msg: Protocol 
authentication error
[2018-05-07T12:37:20.028] error: Munge decode failed: Invalid credential
[2018-05-07T12:37:20.028] error: slurm_unpack_received_msg: 
MESSAGE_NODE_REGISTRATION_STATUS has authentication error: Invalid credential
[2018-05-07T12:37:20.028] error: slurm_unpack_received_msg: Protocol 
authentication error
[2018-05-07T12:37:20.038] error: slurm_receive_msg [10.112.0.14:42140]: 
Unspecified error
[2018-05-07T12:37:20.038] error: slurm_receive_msg [10.112.0.5:34752]: 
Unspecified error
[2018-05-07T12:37:20.038] error: slurm_receive_msg [10.112.0.6:46746]: 
Unspecified error
[2018-05-07T12:37:20.039] error: slurm_receive_msg [10.112.0.16:50788]: 
Unspecified error


I ran the following command on all nodes (including master/headnode) and got 
“Success”

 munge -n | unmunge | grep STATUS
STATUS:           Success (0)


How can I fix this problem?


Thank you in advance for all your help.

Eric


_____________________________________________________________________________________________________

Eric F.  Alemany
System Administrator for Research

Division of Radiation & Cancer  Biology
Department of Radiation Oncology

Stanford University School of Medicine
Stanford, California 94305

Tel:1-650-498-7969<tel:1-650-498-7969>  No Texting
Fax:1-650-723-7382<tel:1-650-723-7382>



Reply via email to