Hi,same here - since RealMemory will almost always be < Free Memory, setting --mem=0 will get the job rejected. Downside is that we have to sensitize our users to request a little less than the 'theoretical maximum' of the nodes - I have some heuristics in job_submit.lua to output hints at job submit for cases where job reservations are very near or slightly over a node type's max. free memory.
Kind regards, René Sitt Am 08.12.22 um 11:28 schrieb Loris Bennett:
Hi Moshe, Moshe Mergy <moshe.me...@weizmann.ac.il> writes:Hi Loris indeed https://slurm.schedmd.com/resource_limits.html explains the possibilities of limitations At present time, I do no limit memory for specific users, but just a global limitation in slurm.conf: MaxMemPerNode=65536 (for 64 GB limitation) But... anyway, for my Slurm version 20.02, any user can obtain MORE than 64 GB of memory by using the "--mem=0" option ! So I had to filter this in job_submit.luaWe don't use MaxMemPerNode but define RealMemory for groups of nodes which have the same amount of RAM. We share the nodes and use SelectType=select/cons_res SelectTypeParameters=CR_Core_Memory So a job can't start on a node if it requests more memory than available, i.e. more than RealMemory minus memory already committed to other jobs, even if --mem=0 is specified (I guess). Cheers, Loris------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Loris Bennett <loris.benn...@fu-berlin.de> Sent: Thursday, December 8, 2022 10:57:56 AM To: Slurm User Community List Subject: Re: [slurm-users] srun --mem issueLoris Bennett <loris.benn...@fu-berlin.de> writes:Moshe Mergy <moshe.me...@weizmann.ac.il> writes:Hi Sandor I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02): if (job_desc.min_mem_per_node == 0 or job_desc.min_mem_per_cpu == 0) then slurm.log_info("%s: ERROR: unlimited memory requested", log_prefix) slurm.log_info("%s: ERROR: job %s from user %s rejected because of an invalid (unlimited) memory request.", log_prefix, job_desc.name, job_desc.user_name) slurm.log_user("Job rejected because of an invalid memory request.") return slurm.ERROR endWhat happens if somebody explicitly requests all the memory, so in Sandor's case --mem=500G ?Maybe there is a better or nicer solution...Can't you just use account and QOS limits: https://slurm.schedmd.com/resource_limits.html ? And anyway, what is the use-case for preventing someone using all the memory? In our case, if someone really need all the memory, they should be able to have it. However, I do have a chronic problem with users requesting too much memory. My approach has been to try to get people to use 'seff' to see what resources their jobs in fact need. In addition each month we generate a graphical summary of 'seff' data for each user, like the one shown here https://www.fu-berlin.de/en/sites/high-performance-computing/Dokumentation/Statistik and automatically send an email to those with a large percentage of resource-inefficient jobs telling them to look at their graphs and correct their resource requirements for future jobs. Cheers, LorisAll the best Moshe-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Felho, Sandor <sandor.fe...@transunion.com> Sent: Wednesday, December 7, 2022 7:03 PM To: slurm-users@lists.schedmd.com Subject: [slurm-users] srun --mem issueTransUnion is running a ten-node site using slurm with multiple queues. We have an issue with --mem parameter. The is one user who has read the slurm manual and found the--mem=0. This is giving the maximum memory on the node (500 GiB's) for the single job. How can I block a --mem=0 request? We are running: * OS: RHEL 7 * cgroups version 1 * slurm: 19.05 Thank you, Sandor Felho Sr Consultant, Data Science & Analytics
-- Dipl.-Chem. René Sitt Hessisches Kompetenzzentrum für Hochleistungsrechnen Philipps-Universität Marburg Hans-Meerwein-Straße 35032 Marburg Tel. +49 6421 28 23523 si...@hrz.uni-marburg.de www.hkhlr.de
smime.p7s
Description: S/MIME Cryptographic Signature