On Wed, Dec 13, 2023 at 08:16:39PM +0000, Jackson, Gary L. wrote: Hi Gary, > The SlurmctldHost value is set like the following in my slurm.conf: > > SlurmctldHost=host0,host1 > > That seems to be legal according to the documentation. However, I get error > messages like the following: > > $ srun id > > srun: error: get_addr_info: getaddrinfo() failed: Name or service not known > srun: error: slurm_set_addr: Unable to resolve "host0,host1" > srun: error: Unable to establish control machine address > srun: error: Unable to allocate resources: Address already in use ... > What’s going on?
Not sure, but I've seen such errors, when using a node name, which was not "registered" via NodeName or discovered otherwise - a code lookup at this time revealed, that the message is IMHO misleading: slurm does __not__ make a DNS lookup - it simply greps its internal list of known nodes and if not found, it emits such messages. Other options: try to use SlurmctldHost=... for each host on a single line to rule out a format errors. Not sure, whether it supports ranges, too (like SlurmctldHost=host[0-1]) , Last but not least 'Address already in use' - checking, whether there is not an instance or something else already listening on the related port shouldn't hurt ... Have fun, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 52768