With Configless Slurm you can use a DNS SRV record to point to your slurmctld server. We're in the process of testing various CentOS 8 (EL8) alternatives (AlmaLinux, RockyLinux, CentOS 8 Stream), and I've found a strange behavior on all EL8 systems:

On CentOS 7.9 compute nodes and servers the "host" command shows the DNS SRV record without having to append the FQDN DNS domain:

$ host -t SRV _slurmctld._tcp
_slurmctld._tcp.nifl.fysik.dtu.dk has SRV record 0 0 6817 que.nifl.fysik.dtu.dk.

whereas the "dig" command doesn't return the answer:

$ dig +short -t SRV -n _slurmctld._tcp

On all EL8 and Fedora FC34 systems in our network, neither "host" nor "dig" return an answer. Only if the FQDN is appended is the DNS information returned:

$ host -t SRV _slurmctld._tcp.nifl.fysik.dtu.dk.
$ dig +short -t SRV -n _slurmctld._tcp.nifl.fysik.dtu.dk.

Needless to say, the correct DNS domain is configured in /etc/resolv.conf.

Additionally, I have access to the Slurm cluster at another university, and on their EL7 nodes "host" works as expected, but on an AlmaLinux 8.4 node it doesn't. So I believe the DNS SRV record problem is not due to our particular network or DNS setup.

Question: Can other sites with any EL8 nodes and Configless Slurm test the "host" command as shown above?

Question: Does anyone know why the "host" command apparently changed behavior from EL7 to EL8 (and FC34) as regards the lookup of SRV records?

This issue is tracked in Slurm bug https://bugs.schedmd.com/show_bug.cgi?id=11878#c2

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark

Reply via email to