Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
Ryan Novosielski writes: >> On Dec 8, 2022, at 21:30, Kilian Cavalotti >> wrote: >> >> Hi Loris, >> >> On Thu, Dec 8, 2022 at 12:59 AM Loris Bennett >> wrote: >>> However, I do have a chronic problem with users requesting too much >>> memory. My approach has been to try to get people to use

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
Ryan Novosielski writes: > On Dec 8, 2022, at 03:57, Loris Bennett wrote: > > Loris Bennett writes: > > Moshe Mergy writes: > > Hi Sandor > > I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02): > > if (job_desc.min_mem_per_node == 0 or job_desc.min_mem_per_cpu

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Ryan Novosielski
On Dec 8, 2022, at 03:57, Loris Bennett mailto:loris.benn...@fu-berlin.de>> wrote: Loris Bennett mailto:loris.benn...@fu-berlin.de>> writes: Moshe Mergy mailto:moshe.me...@weizmann.ac.il>> writes: Hi Sandor I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02): if (job

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Ryan Novosielski
> On Dec 8, 2022, at 21:30, Kilian Cavalotti > wrote: > > Hi Loris, > > On Thu, Dec 8, 2022 at 12:59 AM Loris Bennett > wrote: >> However, I do have a chronic problem with users requesting too much >> memory. My approach has been to try to get people to use 'seff' to see >> what resources thei

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Kilian Cavalotti
Hi Loris, On Thu, Dec 8, 2022 at 12:59 AM Loris Bennett wrote: > However, I do have a chronic problem with users requesting too much > memory. My approach has been to try to get people to use 'seff' to see > what resources their jobs in fact need. In addition each month we > generate a graphical

Re: [slurm-users] Help debugging Slurm configuration

2022-12-08 Thread Glen MacLachlan
Then try using the IP of the controller node as explained here https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmctldAddr or here https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmctldHost. Also, if you look at the first few lines of /etc/hosts (just above the line that reads ### ALL ENTRIES BEL

Re: [slurm-users] Help debugging Slurm configuration

2022-12-08 Thread Glen MacLachlan
Hi, Try starting the slurmd daemon on the compute node interactively with and share any output. /usr/sbin/slurmd -D -vvv Kind Regards, Glen == Glen MacLachlan, PhD *Lead High Performance Computing Engineer * Research Technology Services The George Wash

Re: [slurm-users] Help debugging Slurm configuration

2022-12-08 Thread Jeffrey Layton
localhost is the ctrl name :) I can change it though if needed (I was lazy when I did the initial installation). Thanks! Jeff On Thu, Dec 8, 2022 at 2:30 PM Glen MacLachlan wrote: > One other thing to address is that SlurmctldHost should point to the > controller node where slurmctld is runn

Re: [slurm-users] Help debugging Slurm configuration

2022-12-08 Thread Jeffrey Layton
Thanks Glenn! I change the slurm.conf logging to "debug5" on both the server and the client. I also created /var/log/slurm on both the client and server and chown-ed to slurm:slurm. On the server I did "scontrol reconfigure". Then I rebooted the compute node. When I logged in, slurm was not up.

Re: [slurm-users] Help debugging Slurm configuration

2022-12-08 Thread Glen MacLachlan
One other thing to address is that SlurmctldHost should point to the controller node where slurmctld is running, the name of which I would expect Warewulf would put into /etc/hosts. https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmctldHost Kind Regards, Glen

Re: [slurm-users] Help debugging Slurm configuration

2022-12-08 Thread Glen MacLachlan
What does running this on the compute node show? (looks at journal log for past 12 hours) journalctl -S -12h -o verbose | grep slurm You may want to increase your debug verbosity to debug5 https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdDebug while tracking down this issue. For reference, see

[slurm-users] Help debugging Slurm configuration

2022-12-08 Thread Jeffrey Layton
Good afternoon, I have a very simple two node cluster using Warewulf 4.3. I was following some instructions on how to install the OpenHPC Slurm binaries (server and client). I booted the compute node and the Slurm Server says it's in an unknown state. This hasn't happened to me before but I would

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Felho, Sandor
Bill - thank you for the code. For the record, our queues explicitly block any requests greater than what the queue allows - MaxMemPerNode = 64 GiB's. If there are any other ideas, feel free to comment. Sandor From: slurm-users on behalf of Bill Sent: Thursda

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Bill
If you use a job_submit.lua script just add if job_desc.pn_min_memory == 0 or job_desc.min_mem_per_cpu == 0 then log_info("slurm_job_submit: job from uid %d invalid memory request MaxMemPerNode", job_desc.user_id) return 2044 -- signal ESLURM_INVALID_TASK_MEMORY end Bill

Re: [slurm-users] srun --mem issue

2022-12-08 Thread René Sitt
Hi, same here - since RealMemory will almost always be < Free Memory, setting --mem=0 will get the job rejected. Downside is that we have to sensitize our users to request a little less than the 'theoretical maximum' of the nodes - I have some heuristics in job_submit.lua to output hints at j

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
Hi Moshe, Moshe Mergy writes: > Hi Loris > > indeed https://slurm.schedmd.com/resource_limits.html explains the > possibilities of limitations > > At present time, I do no limit memory for specific users, but just a global > limitation in slurm.conf: > > MaxMemPerNode=65536 (for 64 GB limit

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Moshe Mergy
Hi Loris indeed https://slurm.schedmd.com/resource_limits.html explains the possibilities of limitations At present time, I do no limit memory for specific users, but just a global limitation in slurm.conf: MaxMemPerNode=65536 (for 64 GB limitation) But... anyway, for my Slurm version 2

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
Loris Bennett writes: > Moshe Mergy writes: > >> Hi Sandor >> >> I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02): >> >> if (job_desc.min_mem_per_node == 0 or job_desc.min_mem_per_cpu == 0) then >> slurm.log_info("%s: ERROR: unlimited memory requested", log_

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
Moshe Mergy writes: > Hi Sandor > > I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02): > > if (job_desc.min_mem_per_node == 0 or job_desc.min_mem_per_cpu == 0) then > slurm.log_info("%s: ERROR: unlimited memory requested", log_prefix) > slurm.log_info