Actually, you were right. By setting LLMNR=no in
/etc/systemd/resolved.conf on g-vm03, which turns off link-local
multicast name resolution, we were able to speed up getent hosts ougaXX
significantly, which solves the issue. Thanks!
On 5/13/25 15:50, John Hearns via slurm-users wrote:
I think that looks OK. Forget my response.
On Tue, 13 May 2025 at 14:09, Tilman Hoffbauer via slurm-users
<slurm-users@lists.schedmd.com> wrote:
Thank you for your response. nslookup on e.g. ouga20 is instant,
getent hosts ouga20 takes about 1.6 seconds from g-vm03. It is
about the same speed for ouga20 looking up g-vm03.
Is this too slow?
On 5/13/25 15:01, John Hearns wrote:
Stupid response from me. A loooong time ago I ha issues with
slow response on PBS. The cause was name resolution.
On your setup is name resolution OK? Can you look up host names
without delays?
On Tue, 13 May 2025 at 13:50, Tilman Hoffbauer via slurm-users
<slurm-users@lists.schedmd.com> wrote:
Hello,
we are running a SLURM-managed cluster with one control node
(g-vm03) and 26 worker nodes (ouga[03-28]) on Rocky 8. We
recently updated from 20.11.9 through 23.02.8 to 24.11.0 and
then 24.11.5. Since then, we are experiencing performance
issues - squeue and scontrol ping are slow to react and
sometimes deliver "timeout on send/recv" messages, even with
only very few parallel requests. We did not experience these
issues with SLURM 20.11.9 before, we did not check the
intermediate version 23.02.8 in detail before. In the log of
slurmctld, we can also find messages like
slurmctld: error: slurm_send_node_msg: [socket:[1272743]]
slurm_bufs_sendto(msg_type=RESPONSE_JOB_INFO) failed:
Unexpected missing socket error
We thus implemented all recommendations from the high
throughput documentation, and did achieve improvements with
it (most notably by increasing the maximum number of open
files and increasing MessageTimeout and TCPTimeout).
For debugging, I attached the slurm.conf, the sdiag output
(the server thread count is almost always 1 and sometimes
increases to 2), the slurmctld log and the slurmdbd log from
a time of high load.
We would be very thankful for any input on how restore the
old performance.
Kind Regards,
Tilman Hoffbauer
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to
slurm-users-le...@lists.schedmd.com
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com