We noticed that the slurm controller will remove nodes that it cannot reach.
How can this be disabled?
We would like to see the nodes marked down/drain instead of the controller
removing the nodes from sinfo.

/var/log/slurm/slurmctld.log
[2022-10-25T13:10:01.500] debug:  Log file re-opened
[2022-10-25T13:10:01.589] error: get_addr_info: getaddrinfo() failed:
Temporary failure in name resolution
[2022-10-25T13:10:01.589] error: slurm_set_addr: Unable to resolve
"spg-ethx-f4ce"
[2022-10-25T13:10:01.589] error: slurm_get_port: Address family '0' not
supported
[2022-10-25T13:10:01.589] error: _set_slurmd_addr: failure on spg-ethx-f4ce

cat /etc/slurm/slurm.conf | grep -i f4ce
NodeName=spg-ethx-f4ce ...
PartitionName=debug spg-ethx-f4ce ...

No output in sinfo:
sinfo -N | grep f4ce
sinfo -R | grep f4ce

slurmd -V
slurm 21.08.0

Reply via email to