Hi,

same here - since RealMemory will almost always be < Free Memory, setting --mem=0 will get the job rejected. Downside is that we have to sensitize our users to request a little less than the 'theoretical maximum' of the nodes - I have some heuristics in job_submit.lua to output hints at job submit for cases where job reservations are very near or slightly over a node type's max. free memory.

Kind regards,
René Sitt

Am 08.12.22 um 11:28 schrieb Loris Bennett:
Hi Moshe,

Moshe Mergy <moshe.me...@weizmann.ac.il> writes:

Hi Loris

indeed  https://slurm.schedmd.com/resource_limits.html explains the 
possibilities of limitations

At present time, I do no limit memory for specific users, but just a global 
limitation in slurm.conf:

   MaxMemPerNode=65536 (for 64 GB limitation)

But... anyway, for my Slurm version 20.02, any user can obtain MORE than 64 GB of memory 
by using the "--mem=0" option !

So I had to filter this in  job_submit.lua
We don't use MaxMemPerNode but define RealMemory for groups of nodes
which have the same amount of RAM.  We share the nodes and use

   SelectType=select/cons_res
   SelectTypeParameters=CR_Core_Memory

So a job can't start on a node if it requests more memory than
available, i.e. more than RealMemory minus memory already committed to
other jobs, even if --mem=0 is specified (I guess).

Cheers,

Loris

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Loris Bennett 
<loris.benn...@fu-berlin.de>
Sent: Thursday, December 8, 2022 10:57:56 AM
To: Slurm User Community List
Subject: Re: [slurm-users] srun --mem issue
Loris Bennett <loris.benn...@fu-berlin.de> writes:

Moshe Mergy <moshe.me...@weizmann.ac.il> writes:

Hi Sandor

I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02):

   if (job_desc.min_mem_per_node == 0  or  job_desc.min_mem_per_cpu == 0) then
         slurm.log_info("%s: ERROR: unlimited memory requested", log_prefix)
         slurm.log_info("%s: ERROR: job %s from user %s rejected because of an 
invalid (unlimited) memory request.", log_prefix, job_desc.name, job_desc.user_name)
         slurm.log_user("Job rejected because of an invalid memory request.")
         return slurm.ERROR
    end
What happens if somebody explicitly requests all the memory, so in
Sandor's case --mem=500G ?

Maybe there is a better or nicer solution...
Can't you just use account and QOS limits:

   https://slurm.schedmd.com/resource_limits.html

?

And anyway, what is the use-case for preventing someone using all the
memory? In our case, if someone really need all the memory, they should be able
to have it.

However, I do have a chronic problem with users requesting too much
memory. My approach has been to try to get people to use 'seff' to see
what resources their jobs in fact need.  In addition each month we
generate a graphical summary of 'seff' data for each user, like the one
shown here

   
https://www.fu-berlin.de/en/sites/high-performance-computing/Dokumentation/Statistik

and automatically send an email to those with a large percentage of
resource-inefficient jobs telling them to look at their graphs and
correct their resource requirements for future jobs.

Cheers,

Loris

All the best
Moshe


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Felho, Sandor 
<sandor.fe...@transunion.com>
Sent: Wednesday, December 7, 2022 7:03 PM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] srun --mem issue
TransUnion is running a ten-node site using slurm with multiple queues. We have an issue with --mem parameter. The is one user who has read the slurm manual and found the
--mem=0. This is giving the maximum memory on the node (500 GiB's) for the 
single job. How can I block a --mem=0 request?

We are running:

* OS: RHEL 7
* cgroups version 1
* slurm: 19.05

Thank you,

Sandor Felho

Sr Consultant, Data Science & Analytics

--
Dipl.-Chem. René Sitt
Hessisches Kompetenzzentrum für Hochleistungsrechnen
Philipps-Universität Marburg
Hans-Meerwein-Straße
35032 Marburg

Tel. +49 6421 28 23523
si...@hrz.uni-marburg.de
www.hkhlr.de

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to