[slurm-users] bug when using SlurmctldParameters=cloud_reg_addrs ? error: get_name_info: getnameinfo() failed: Name or service not known

Pablo Escobar Lopez Mon, 25 Oct 2021 09:58:22 -0700

Hi,

I have configured slurm cloud scheduling for OpenStack. I am using CentOS7
with slurm version 20.11.8 installed using EPEL RPMs and it's working fine
but I am getting some strange errors in the slurm master logs which I think
are a bug.


I am using these options in slurm.conf:
SlurmctldParameters=enable_configless,cloud_reg_addrs,idle_on_node_suspend

I am using these options in my slurm.conf so the cloud nodes work in
"configless"mode and the ip for the cloud nodes is automatically updated on
the slurm master when the cloud node contacts the slurm master, as
described in the docs:
https://slurm.schedmd.com/slurm.conf.html#OPT_cloud_reg_addrs
https://slurm.schedmd.com/configless_slurm.html

When the cloud nodes are shutdown I get this info using scontrol:

$>scontrol show node demo-slurm-compute-05 |grep -i NodeAddr
NodeAddr=demo-slurm-compute-05 NodeHostName=demo-slurm-compute-05
Version=20.11.8

And when the cloud node boots and contacts the master the ip is properly
updated so the option "cloud_reg_addrs" seems to work fine. This is the
output of scontrol when a cloud node boots:

$> scontrol show node demo-slurm-compute-dynamic-05 |grep NodeAddr
NodeAddr=192.168.105.128 NodeHostName=192.168.105.128 Version=20.11.8

But still every time a new cloud node boots and contacts the slurm master I
get these errors in the slurm master log "slurmctld.log"

error: get_name_info: getnameinfo() failed: Name or service not known
error: slurm_auth_get_host: Lookup failed for 192.168.105.128

It seems that even if the node ip is updated on the master slurmctld still
tries to resolve the hostname and it's triggering this error. Despite the
error the node joins the cluster and can execute jobs.

Has anyone experienced this problem? Is this a bug or am I doing something
wrong with my config?

Best regards,
Pablo.

[slurm-users] bug when using SlurmctldParameters=cloud_reg_addrs ? error: get_name_info: getnameinfo() failed: Name or service not known

Reply via email to