Hi, I have configured slurm cloud scheduling for OpenStack. I am using CentOS7 with slurm version 20.11.8 installed using EPEL RPMs and it's working fine but I am getting some strange errors in the slurm master logs which I think are a bug.
I am using these options in slurm.conf: SlurmctldParameters=enable_configless,cloud_reg_addrs,idle_on_node_suspend I am using these options in my slurm.conf so the cloud nodes work in "configless"mode and the ip for the cloud nodes is automatically updated on the slurm master when the cloud node contacts the slurm master, as described in the docs: https://slurm.schedmd.com/slurm.conf.html#OPT_cloud_reg_addrs https://slurm.schedmd.com/configless_slurm.html When the cloud nodes are shutdown I get this info using scontrol: $>scontrol show node demo-slurm-compute-05 |grep -i NodeAddr NodeAddr=demo-slurm-compute-05 NodeHostName=demo-slurm-compute-05 Version=20.11.8 And when the cloud node boots and contacts the master the ip is properly updated so the option "cloud_reg_addrs" seems to work fine. This is the output of scontrol when a cloud node boots: $> scontrol show node demo-slurm-compute-dynamic-05 |grep NodeAddr NodeAddr=192.168.105.128 NodeHostName=192.168.105.128 Version=20.11.8 But still every time a new cloud node boots and contacts the slurm master I get these errors in the slurm master log "slurmctld.log" error: get_name_info: getnameinfo() failed: Name or service not known error: slurm_auth_get_host: Lookup failed for 192.168.105.128 It seems that even if the node ip is updated on the master slurmctld still tries to resolve the hostname and it's triggering this error. Despite the error the node joins the cluster and can execute jobs. Has anyone experienced this problem? Is this a bug or am I doing something wrong with my config? Best regards, Pablo.