Hi, I have removed a node, but the squeue command doesn't work and it seems that it still searches for the missing node.
[root@rocks7 home]# > /var/log/slurm/slurmctld.log [root@rocks7 home]# systemctl restart slurmctld [root@rocks7 home]# systemctl restart slurmd [root@rocks7 home]# rocks sync slurm slurm_load_ctl_conf error: Unable to contact slurm controller (connect failure) [root@rocks7 home]# cat /var/log/slurm/slurmctld.log [2018-10-17T14:41:36.682] slurmctld version 17.11.5 started on cluster jupiter [2018-10-17T14:41:37.212] layouts: no layout to initialize [2018-10-17T14:41:37.216] layouts: loading entities/relations information [2018-10-17T14:41:37.216] error: _find_node_record(751): lookup failure for compute-0-6 [2018-10-17T14:41:37.216] error: Node compute-0-6 has vanished from configuration [2018-10-17T14:41:37.216] Recovered state of 7 nodes [2018-10-17T14:41:37.216] Down nodes: compute-0-4 [2018-10-17T14:41:37.216] Recovered JobID=1440 State=0x1 NodeCnt=0 Assoc=59 [2018-10-17T14:41:37.216] recovered job step 1442.0 [2018-10-17T14:41:37.216] Recovered JobID=1442 State=0x1 NodeCnt=0 Assoc=76 [2018-10-17T14:41:37.216] recovered job step 1443.0 [2018-10-17T14:41:37.216] Recovered JobID=1443 State=0x1 NodeCnt=0 Assoc=77 [2018-10-17T14:41:37.216] Recovered information about 3 jobs [2018-10-17T14:41:37.216] error: _find_node_record(751): lookup failure for compute-0-6 [2018-10-17T14:41:37.216] error: build_part_bitmap: invalid node name compute-0-6 [2018-10-17T14:41:37.217] fatal: Invalid node names in partition EMERALD [root@rocks7 home]# cat /etc/slurm/parts PartitionName=WHEEL RootOnly=yes Priority=1000 Nodes=ALL PartitionName=DIAMOND AllowAccounts=monthly Nodes=compute-0-[0-1] PartitionName=EMERALD AllowAccounts=em1,z1,z2,em4,z3,z5,z9 Nodes=compute-0-[2-5],rocks7 PartitionName=RUBY AllowAccounts=y8,y10 Nodes=compute-0-[3-5] PartitionName=NOLIMIT AllowAccounts=nl Nodes=compute-0-[4-5] [root@rocks7 home]# Any idea? Something zombie is still there. Regards, Mahmood