[slurm-users] slumctld don't start at boot

2021-07-23 Thread Riccardo Sucapane
Hello everyone, I am using Slurm as a workload manager on a system with a master and 3 nodes. The operating system used is the recent rocky linux 8.4 while for slurm, is used the version 20.11.8 taken from EPEL repository. Everything works correctly and when the system is started the command "syste

Re: [slurm-users] 4 sockets but "

2021-07-23 Thread Ole Holm Nielsen
On 7/23/21 1:24 PM, Diego Zuccato wrote: Il 23/07/2021 13:15, Ole Holm Nielsen ha scritto: But it's not whowing jobIDs nor users :( That is really strange!  The pestat obtains username and jobid from the squeue command.  Do you get this information from "squeue -t running"? $ squeue -t runnin

Re: [slurm-users] 4 sockets but "

2021-07-23 Thread Diego Zuccato
Il 23/07/2021 13:15, Ole Holm Nielsen ha scritto: But it's not whowing jobIDs nor users :( That is really strange!  The pestat obtains username and jobid from the squeue command.  Do you get this information from "squeue -t running"? $ squeue -t running JOBID PARTITION NAME

Re: [slurm-users] 4 sockets but "

2021-07-23 Thread Ole Holm Nielsen
On 7/23/21 1:15 PM, Ole Holm Nielsen wrote: On 7/23/21 1:07 PM, Diego Zuccato wrote: Well, Slurm reports the 15-minute load average.  I guess users will have to learn that, because we can't print help information every time. They'd probably omit reading it anyway... Actually, I found a bit of

Re: [slurm-users] 4 sockets but "

2021-07-23 Thread Ole Holm Nielsen
On 7/23/21 1:07 PM, Diego Zuccato wrote: Well, Slurm reports the 15-minute load average.  I guess users will have to learn that, because we can't print help information every time. They'd probably omit reading it anyway... Actually, I found a bit of unused space below the CPUload heading, so I

Re: [slurm-users] slumctld don't start at boot

2021-07-23 Thread Riccardo Sucapane
Yes, the problem was that. Thanks everyone for the help. Greetings Riccardo Il giorno ven 23 lug 2021 alle ore 13:04 Ole Holm Nielsen < ole.h.niel...@fysik.dtu.dk> ha scritto: > On 7/23/21 1:00 PM, Diego Zuccato wrote: > > We answered in parallel :) > > I usually prefer to avoid modifying system-

Re: [slurm-users] 4 sockets but "

2021-07-23 Thread Diego Zuccato
Il 23/07/2021 13:01, Ole Holm Nielsen ha scritto: Well, Slurm reports the 15-minute load average.  I guess users will have to learn that, because we can't print help information every time. They'd probably omit reading it anyway... Actually, I found a bit of unused space below the CPUload head

Re: [slurm-users] slumctld don't start at boot

2021-07-23 Thread Ole Holm Nielsen
On 7/23/21 1:00 PM, Diego Zuccato wrote: We answered in parallel :) I usually prefer to avoid modifying system-managed files because system updates could reset 'em. Since systemd allows overrides, I chose to use 'em :) I agree with you! The permanent fix will change those Systemd files in 2

Re: [slurm-users] 4 sockets but "

2021-07-23 Thread Ole Holm Nielsen
On 7/23/21 12:43 PM, Ole Holm Nielsen wrote: On 7/23/21 12:36 PM, Diego Zuccato wrote: I believe that slurmd reports the 15 minute CPU load average to the slurmctld, only.  So you got this information already. Yup. It's just unexpected: if you don't know, you run pestat and see that an idle nod

Re: [slurm-users] slumctld don't start at boot

2021-07-23 Thread Diego Zuccato
We answered in parallel :) I usually prefer to avoid modifying system-managed files because system updates could reset 'em. Since systemd allows overrides, I chose to use 'em :) Il 23/07/2021 12:52, Ole Holm Nielsen ha scritto: On 7/23/21 12:29 PM, Riccardo Sucapane wrote: I am using Slurm a

Re: [slurm-users] slumctld don't start at boot

2021-07-23 Thread Diego Zuccato
Hi Riccardo. I've had a similar problem (slurm.conf is served via NFS share). I just modified slurmd unit: #systemctl edit slurmd [Unit] Requires=network-online.target After=home.mount HIH Diego Il 23/07/2021 12:29, Riccardo Sucapane ha scritto: Hello everyone, I am using Slurm as a workloa

Re: [slurm-users] slumctld don't start at boot

2021-07-23 Thread Ole Holm Nielsen
On 7/23/21 12:29 PM, Riccardo Sucapane wrote: I am using Slurm as a workload manager on a system with a master and 3 nodes. The operating system used is the recent rocky linux 8.4 while for slurm, is used the version 20.11.8 taken from EPEL repository. Everything works correctly and when the syst

Re: [slurm-users] 4 sockets but "

2021-07-23 Thread Ole Holm Nielsen
Hi Diego, On 7/23/21 12:36 PM, Diego Zuccato wrote: I believe that slurmd reports the 15 minute CPU load average to the slurmctld, only.  So you got this information already. Yup. It's just unexpected: if you don't know, you run pestat and see that an idle node does have a very high load :) My

Re: [slurm-users] 4 sockets but "

2021-07-23 Thread Diego Zuccato
Hi Loris. Il 23/07/2021 09:05, Loris Bennett ha scritto: We use both Zabbix and pestat. Zabbix gives us general information on the state of the nodes and file systems, and we have added some Slurm metrics, such as number of jobs pending, amount of memory pending, number of GPUs pending, etc.

Re: [slurm-users] 4 sockets but "

2021-07-23 Thread Loris Bennett
Hi Ole, Ole Holm Nielsen writes: > Hi Loris, > > On 7/23/21 9:05 AM, Loris Bennett wrote: >> We use both Zabbix and pestat. Zabbix gives us general information on >> the state of the nodes and file systems, and we have added some Slurm >> metrics, such as number of jobs pending, amount of memor

Re: [slurm-users] 4 sockets but "

2021-07-23 Thread Ole Holm Nielsen
Hi Loris, On 7/23/21 9:05 AM, Loris Bennett wrote: We use both Zabbix and pestat. Zabbix gives us general information on the state of the nodes and file systems, and we have added some Slurm metrics, such as number of jobs pending, amount of memory pending, number of GPUs pending, etc. This ha

Re: [slurm-users] 4 sockets but "

2021-07-23 Thread Loris Bennett
Ole Holm Nielsen writes: > Hi Diego, > > On 7/23/21 8:16 AM, Diego Zuccato wrote: >>> The Configless Slurm (https://slurm.schedmd.com/configless_slurm.html) from >>> 20.02 makes distribution of slurm.conf really simple. >> Eager to see it in Debian :) > > IMHO, there ought to be a community effor