Ryan Novosielski writes:
>> On Dec 8, 2022, at 21:30, Kilian Cavalotti
>> wrote:
>>
>> Hi Loris,
>>
>> On Thu, Dec 8, 2022 at 12:59 AM Loris Bennett
>> wrote:
>>> However, I do have a chronic problem with users requesting too much
>>> memory. My approach has been to try to get people to use
Ryan Novosielski writes:
> On Dec 8, 2022, at 03:57, Loris Bennett wrote:
>
> Loris Bennett writes:
>
> Moshe Mergy writes:
>
> Hi Sandor
>
> I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02):
>
> if (job_desc.min_mem_per_node == 0 or job_desc.min_mem_per_cpu
On Dec 8, 2022, at 03:57, Loris Bennett
mailto:loris.benn...@fu-berlin.de>> wrote:
Loris Bennett mailto:loris.benn...@fu-berlin.de>>
writes:
Moshe Mergy mailto:moshe.me...@weizmann.ac.il>>
writes:
Hi Sandor
I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02):
if (job
> On Dec 8, 2022, at 21:30, Kilian Cavalotti
> wrote:
>
> Hi Loris,
>
> On Thu, Dec 8, 2022 at 12:59 AM Loris Bennett
> wrote:
>> However, I do have a chronic problem with users requesting too much
>> memory. My approach has been to try to get people to use 'seff' to see
>> what resources thei
Hi Loris,
On Thu, Dec 8, 2022 at 12:59 AM Loris Bennett
wrote:
> However, I do have a chronic problem with users requesting too much
> memory. My approach has been to try to get people to use 'seff' to see
> what resources their jobs in fact need. In addition each month we
> generate a graphical
Then try using the IP of the controller node as explained here
https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmctldAddr or here
https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmctldHost.
Also, if you look at the first few lines of /etc/hosts (just above the line
that reads ### ALL ENTRIES BEL
Hi,
Try starting the slurmd daemon on the compute node interactively with and
share any output.
/usr/sbin/slurmd -D -vvv
Kind Regards,
Glen
==
Glen MacLachlan, PhD
*Lead High Performance Computing Engineer *
Research Technology Services
The George Wash
localhost is the ctrl name :)
I can change it though if needed (I was lazy when I did the initial
installation).
Thanks!
Jeff
On Thu, Dec 8, 2022 at 2:30 PM Glen MacLachlan wrote:
> One other thing to address is that SlurmctldHost should point to the
> controller node where slurmctld is runn
Thanks Glenn!
I change the slurm.conf logging to "debug5" on both the server and the
client.
I also created /var/log/slurm on both the client and server and chown-ed to
slurm:slurm.
On the server I did "scontrol reconfigure".
Then I rebooted the compute node. When I logged in, slurm was not up.
One other thing to address is that SlurmctldHost should point to the
controller node where slurmctld is running, the name of which I would
expect Warewulf would put into /etc/hosts.
https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmctldHost
Kind Regards,
Glen
What does running this on the compute node show? (looks at journal log for
past 12 hours)
journalctl -S -12h -o verbose | grep slurm
You may want to increase your debug verbosity to debug5
https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdDebug while tracking
down this issue.
For reference, see
Good afternoon,
I have a very simple two node cluster using Warewulf 4.3. I was following
some instructions on how to install the OpenHPC Slurm binaries (server and
client). I booted the compute node and the Slurm Server says it's in an
unknown state. This hasn't happened to me before but I would
Bill - thank you for the code.
For the record, our queues explicitly block any requests greater than what the
queue allows - MaxMemPerNode = 64 GiB's.
If there are any other ideas, feel free to comment.
Sandor
From: slurm-users on behalf of Bill
Sent: Thursda
If you use a job_submit.lua script just add
if job_desc.pn_min_memory == 0 or job_desc.min_mem_per_cpu == 0 then
log_info("slurm_job_submit: job from uid %d invalid memory
request MaxMemPerNode", job_desc.user_id)
return 2044 -- signal ESLURM_INVALID_TASK_MEMORY
end
Bill
Hi,
same here - since RealMemory will almost always be < Free Memory,
setting --mem=0 will get the job rejected. Downside is that we have to
sensitize our users to request a little less than the 'theoretical
maximum' of the nodes - I have some heuristics in job_submit.lua to
output hints at j
Hi Moshe,
Moshe Mergy writes:
> Hi Loris
>
> indeed https://slurm.schedmd.com/resource_limits.html explains the
> possibilities of limitations
>
> At present time, I do no limit memory for specific users, but just a global
> limitation in slurm.conf:
>
> MaxMemPerNode=65536 (for 64 GB limit
Hi Loris
indeed https://slurm.schedmd.com/resource_limits.html explains the
possibilities of limitations
At present time, I do no limit memory for specific users, but just a global
limitation in slurm.conf:
MaxMemPerNode=65536 (for 64 GB limitation)
But... anyway, for my Slurm version 2
Loris Bennett writes:
> Moshe Mergy writes:
>
>> Hi Sandor
>>
>> I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02):
>>
>> if (job_desc.min_mem_per_node == 0 or job_desc.min_mem_per_cpu == 0) then
>> slurm.log_info("%s: ERROR: unlimited memory requested", log_
Moshe Mergy writes:
> Hi Sandor
>
> I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02):
>
> if (job_desc.min_mem_per_node == 0 or job_desc.min_mem_per_cpu == 0) then
> slurm.log_info("%s: ERROR: unlimited memory requested", log_prefix)
> slurm.log_info
19 matches
Mail list logo