Thank you for your response. nslookup on e.g. ouga20 is instant, getent hosts ouga20 takes about 1.6 seconds from g-vm03. It is about the same speed for ouga20 looking up g-vm03.

Is this too slow?

On 5/13/25 15:01, John Hearns wrote:
Stupid response from me.  A loooong time ago I ha issues with slow response on PBS. The cause was name resolution.

On your setup is name resolution OK? Can you look up host names without delays?

On Tue, 13 May 2025 at 13:50, Tilman Hoffbauer via slurm-users <slurm-users@lists.schedmd.com> wrote:

    Hello,

    we are running a SLURM-managed cluster with one control node
    (g-vm03) and 26 worker nodes (ouga[03-28]) on Rocky 8. We recently
    updated from 20.11.9 through 23.02.8 to 24.11.0 and then 24.11.5.
    Since then, we are experiencing performance issues - squeue and
    scontrol ping are slow to react and sometimes deliver "timeout on
    send/recv" messages, even with only very few parallel requests. We
    did not experience these issues with SLURM 20.11.9 before, we did
    not check the intermediate version 23.02.8 in detail before. In
    the log of slurmctld, we can also find messages like

    slurmctld: error: slurm_send_node_msg: [socket:[1272743]]
    slurm_bufs_sendto(msg_type=RESPONSE_JOB_INFO) failed: Unexpected
    missing socket error

    We thus implemented all recommendations from the high throughput
    documentation, and did achieve improvements with it (most notably
    by increasing the maximum number of open files and increasing
    MessageTimeout and TCPTimeout).

    For debugging, I attached the slurm.conf, the sdiag output (the
    server thread count is almost always 1 and sometimes increases to
    2), the slurmctld log and the slurmdbd log from a time of high load.

    We would be very thankful for any input on how restore the old
    performance.

    Kind Regards,
    Tilman Hoffbauer



-- slurm-users mailing list -- slurm-users@lists.schedmd.com
    To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to