We answered in parallel :)
I usually prefer to avoid modifying system-managed files because system updates could reset 'em. Since systemd allows overrides, I chose to use 'em :)

Il 23/07/2021 12:52, Ole Holm Nielsen ha scritto:
On 7/23/21 12:29 PM, Riccardo Sucapane wrote:
I am using Slurm as a workload manager on a system
with a master and 3 nodes.
The operating system used is the recent rocky linux 8.4
while for slurm, is used the version 20.11.8 taken from EPEL
repository.
Everything works correctly and when the system is started the command
"systemctl start slurmctld" works fine, but at boot the daemon
slurmctld does not start on the master machine, reporting a series of errors. Without reporting all the slurmctld.log the recurring error is the following:

[2021-07-23T09:58:01.932] error: get_addr_info: getaddrinfo() failed: Name or service not known [2021-07-23T09:58:01.932] error: slurm_set_addr: Unable to resolve "blade01" [2021-07-23T09:58:01.932] error: slurm_get_port: Address family '0' not supported
[2021-07-23T09:58:01.932] error: _set_slurmd_addr: failure on blade01

This seems to be a DNS name resolution error.

This could be due to slurmctld starting before the server's network is completely up!  We have seen this with slurmd on EL 8.4 nodes, and I found a solution, see https://bugs.schedmd.com/show_bug.cgi?id=11878#c5.  This will be fixed in Slurm 21.08.

In /usr/lib/systemd/system/slurmd.service and /usr/lib/systemd/system/slurmctld.service you should replace "network.target" by "network-online.target".  Reboot to test it.

In this case I have set it in the slurm.conf file, for simplicity,
"AccountingStorageType=accounting_storage/none", but also using the
slurmdbd/mariadb support is all right with no problems, but slurmctld
still does not start on boot.
Also in the log reported blade01 is the hostname of one of the nodes.

You should probably fix /usr/lib/systemd/system/slurmdbd.service as well.

/Ole


--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Reply via email to