Re: [slurm-users] Database cluster

2024-01-24 Thread Henkel, Andreas
Hi Daniel, We run a simple Galera-MySQL Cluster and have a HAproxy running on all clients to steer the requests (round-Robin) to one of the DB-nodes that answer the health check properly. Best, Andreas Am 23.01.2024 um 15:35 schrieb Daniel L'Hommedieu :  Xand, Thanks - that’s great to hear.

[slurm-users] Slurm version 23.11.3 is now available

2024-01-24 Thread Tim McMullan
We are pleased to announce the availability of Slurm version 23.11.3. The 23.11.3 fixes a single issue related to the new Debian package support. For 23.11.2, the debian changelog was not updated correctly and would generate packages as 23.11.1. This release will correctly generate 23.11.3 p

Re: [slurm-users] slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK).

2024-01-24 Thread Cristóbal Navarro
Hi, A few minutes ago recompiled the cgroups_v2 plugin from slurm with the fix included, replaced the old cgroups_v2.{a,la,so} files with the new ones on /usr/lib/slurm and now jobs work properly on that node. Many thanks for all the help. Indeed, in a few months we will update to the most recent 2

Re: [slurm-users] slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK).

2024-01-24 Thread Tim Schneider
Hi, I just tested with 23.02.7-1 and the issue is gone. So it seems like the patch got released. Best, Tim On 1/24/24 16:55, Stefan Fleischmann wrote: On Wed, 24 Jan 2024 12:37:04 -0300 Cristóbal Navarro wrote: Many thanks One question? Do we have to apply this patch (and recompile slurm

Re: [slurm-users] slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK).

2024-01-24 Thread Charles Hedrick
Since they took the patch, it's not needed if you're using the version they fixed. However it looks like they haven't released that version yet. The patch is to slurmd. You don't need it on the controller. If you're only having problems with some systems, you can put it just on those systems, bu

Re: [slurm-users] slurmstepd: error: load_ebpf_prog: BPF load error (No space left on device). Please check your system limits (MEMLOCK).

2024-01-24 Thread Cristóbal Navarro
Many thanks One question? Do we have to apply this patch (and recompile slurm i guess) only on the compute-node with problems? Also, I noticed the patch now appears as "obsolete", is that ok? On Wed, Jan 24, 2024 at 9:52 AM Stefan Fleischmann wrote: > Turns out I was wrong, this is not a problem

Re: [slurm-users] Issues with Slurm 23.11.1

2024-01-24 Thread Fokke Dijkstra
Dear Brian, Thanks for the hints, I think you are correctly pointing at some network connection issue. I've disabled firewalld on the control host, but that unfortunately did not help. The processes stuck in CLOSE-WAIT suggest indeed that network connections are not properly terminated. I've tried

Re: [slurm-users] slurm-config on NFS-volume

2024-01-24 Thread Steffen Grunewald
On Wed, 2024-01-24 at 14:34:02 +0100, Steffen Grunewald wrote: > > After=network.target munge.service autofs.service Also, probably the more important change, RequiresMountsFor=/home/slurm > because my /home directories are automounted and /etc/slurm is pointing to > /home/slurm/etc Apologies

Re: [slurm-users] slurm-config on NFS-volume

2024-01-24 Thread Steffen Grunewald
On Wed, 2024-01-24 at 13:01:39 +, Werf, C.G. van der (Carel) wrote: > Hi, > > Among other clusters, I have a simple cluster with 2 nodes, running slurm. > > 1 node runs : mysqld, slurmdbd, slurmctld and slurmd. > The other node, only runs slurmd. > Slurm config is in node1: /etc/slurm. A copy

Re: [slurm-users] slurm-config on NFS-volume

2024-01-24 Thread Loris Bennett
Hi Carel, "Werf, C.G. van der (Carel)" writes: > Hi, > > Among other clusters, I have a simple cluster with 2 nodes, running slurm. > > 1 node runs : mysqld, slurmdbd, slurmctld and slurmd. > The other node, only runs slurmd. > Slurm config is in node1: /etc/slurm. A copy of the config is in >

[slurm-users] slurm-config on NFS-volume

2024-01-24 Thread Werf, C.G. van der (Carel)
Hi, Among other clusters, I have a simple cluster with 2 nodes, running slurm. 1 node runs : mysqld, slurmdbd, slurmctld and slurmd. The other node, only runs slurmd. Slurm config is in node1: /etc/slurm. A copy of the config is in node2:/etc/slurm. This slurm configuration runs ok. But, as I am