Hi Nousheen,
It seems that you have configured incorrectly the nodes in slurm.conf. I
notice this:
RealMemory=1
This means 1 Megabyte of RAM memory, we only had this with IBM PCs back in
the 1980ies :-)
See how to configure nodes in
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_confi
Many thanks, William. That may have been the issue. I changed the hostname
to FQDN and "StorageHost=localhost" and now it seems to try connecting to
the database.
[root@mannose sushil]# cat /var/log/slurm/slurmctld.log
[2022-12-01T15:26:50.942] Job accounting information stored, but details
not g
Dear Robbert,
Thankyou so much for your response. I was so focused on sync of time that I
missed the date on one of the nodes which was 1 day behind as you said. I
have corrected it and now i get the following output in status.
*(base) [nousheen@nousheen slurm]$ systemctl status slurmctld.service
I believe that the error you need to pay attention to for this issue is this
line:
Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: error: Check for out of
sync clocks
It looks like your compute nodes clock is a full day ahead of your controller
node. Dec. 2 instead of Dec. 1. The clo
Hi Brian,
Thanks for your answer and sorry for my late reply.
Yes, the cluster is not large, perhaps medium sized? What is considered for
Slurm a small/medium/large cluster?
I think I would go for the remote license method, to avoid what you
mentioned about keeping the slurm.conf consistent on al
Hello Everyone,
I am using slurm version 21.08.5 and Centos 7.
I successfully start slurmd on all compute nodes but when I start
slurmctld on server node it gives the following error:
*(base) [nousheen@nousheen ~]$ systemctl status slurmctld.service -l*
● slurmctld.service - Slurm controller da