Greetings, Reminder: i am new to SLURM.
When i execute “sinfo” my nodes are down. sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up infinite 4 down* radonc[01-04] This is what i have done so far and nothing has helped. The nodes are in “idle” state for 2-3 minutes and then there are “down” again. systemctl restart slurmd on all nodes systemctl restart slurmctld on master scontrol update node=radonc[01-04] state=UNDRAIN scontrol update node=radonc[01-04] state=IDLE I looked at the log file in /var/log/SlurmdLogFile.log and saw some “munge decode failed: Invalid credential” [2018-05-07T12:37:20.028] error: slurm_unpack_received_msg: MESSAGE_NODE_REGISTRATION_STATUS has authentication error: Invalid credential [2018-05-07T12:37:20.028] error: slurm_unpack_received_msg: Protocol authentication error [2018-05-07T12:37:20.028] error: Munge decode failed: Invalid credential [2018-05-07T12:37:20.028] error: slurm_unpack_received_msg: MESSAGE_NODE_REGISTRATION_STATUS has authentication error: Invalid credential [2018-05-07T12:37:20.028] error: slurm_unpack_received_msg: Protocol authentication error [2018-05-07T12:37:20.038] error: slurm_receive_msg [10.112.0.14:42140]: Unspecified error [2018-05-07T12:37:20.038] error: slurm_receive_msg [10.112.0.5:34752]: Unspecified error [2018-05-07T12:37:20.038] error: slurm_receive_msg [10.112.0.6:46746]: Unspecified error [2018-05-07T12:37:20.039] error: slurm_receive_msg [10.112.0.16:50788]: Unspecified error I ran the following command on all nodes (including master/headnode) and got “Success” munge -n | unmunge | grep STATUS STATUS: Success (0) How can I fix this problem? Thank you in advance for all your help. Eric _____________________________________________________________________________________________________ Eric F. Alemany System Administrator for Research Division of Radiation & Cancer Biology Department of Radiation Oncology Stanford University School of Medicine Stanford, California 94305 Tel:1-650-498-7969<tel:1-650-498-7969> No Texting Fax:1-650-723-7382<tel:1-650-723-7382>