[slurm-users] Re: Do I have to hold back RAM for worker nodes?

Xaver Stiensmeier via slurm-users Tue, 13 May 2025 04:26:07 -0700

Thank you very much for the many experiences shared - especially forpointing out how RAM requirements may grow over time!

Our instances can vary wildly from 2 GB (rather unreasonable for Slurm)to multiple TB of RAM and given that we only provide resources and toolsbut not manage the running clusters, we cannot readjust values once thecluster has started.



Currently, I am considering using CR_Core_Memory with node configurations:

   RealMemory=node_memory
   MemSpecLimit=min(node_memory//4 + 1000, 8000) # MB

This would result in:

   2 -> 1 GB reserved (which is unreasonable small anyway)
   4 -> 2 GB
   8 -> 3 GB
   16 -> 5 GB
   32 -> 8 GB

   ... -> 8 GB

This tries to respect that smaller instances are not really able to givemuch RAM to the system and I know that especially in the 2 GB RAMinstances this probably will lead to OOM terminations, but if I reservethe whole 2 GB for Slurm, there's not much to compute with. We will adda warning that instances below 4 GB RAM are not really feasible asworker nodes. I feel like we will only be able to improve that formulawith more experience.



Best regards

Xaver


On 5/12/25 21:30, Timony, Mick via slurm-users wrote:

We do something very similar at HMS. For instance our nodes with257468MB of RAM we round down RealMemory to 257000MB, for nodes with1031057MB of RAM we round down to 1000000 etc.
We may tune this on our next OS and Slurm update as I expect to seemore memory used by the OS as we migrating to RHEL9.
Cheers

--
Mick Timony
Senior DevOps Engineer
LASER, Longwood, & O2 Cluster Admin
Harvard Medical School
--
------------------------------------------------------------------------
*From:* Paul Edmon via slurm-users <slurm-users@lists.schedmd.com>
*Sent:* Monday, May 12, 2025 10:14 AM
*To:* slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
*Subject:* [slurm-users] Re: Do I have to hold back RAM for worker nodes?
The way we typically do it here is that we look at the idle memoryusage of the system by the OS and then reserve the nearest power of 2for that. For instance right now we have 16 GB set for ourMemSpecLimit. That may seem like a lot but our nodes typically have 1TB of memory so 16 GB is not that much. The newer hardware tends toeat up more base memory, at least from my experience.
-Paul Edmon-

On 5/12/25 8:55 AM, Xaver Stiensmeier via slurm-users wrote:
Josh,
thank you for your thorough answer. I, too, considered switching toCR_Core_Memory after reading into this. Thank you for confirming mysuspicion that without Memory, we cannot handle high memory requestsadequately.
If I may ask: *How do you come up with the specific MemSpecLimit?* Doyou handpick a value for each node, have you picked a constant valuefor all nodes or do you take a capped percentage of the maximummemory available?
Best regards,
Xaver

On 5/12/25 14:43, Joshua Randall wrote:
Xaver,
It is my understanding that if we want to have stable systems thatdon't run out of memory, we do need to manage the amount of memoryneeded for everything not running within a slurm job, yes.
In our cluster, we are using `CR_Core_Memory` (so we do constrainour job memory) and we set the `RealMemory` to the actual fullamount of memory available on the machine - I believe these reallyare given in megabytes (MB), not mebibytes (MiB). I think theirexample of (e.g. "2048") is intended to convey this because 2000 MiBis 2048 MB. We set the `MemSpecLimit` for each node to set memoryaside for everything in the system that is not running within aslurm job. This include the slurm daemon itself, the kernel,filesystem drivers, metrics collection agents, etc -- anything elsewe are running outside the control of slurm jobs. The `MemSpecLimit`just sets aside the specified amount and the result will be that themaximum memory jobs can use on the node is (RealMemory -MemSpecLimit). When using cgroups to limit memory, slurmd will alsobe allocated the specified limit so that the slurm daemon cannotencroach on job memory. However, note that `MemSpecLimit` isdocumented to not work unless your `SelectTypeParameters` includesMemory as a consumable resource.
Since you are using `CR_Core` (which does not configure Memory as aconsumable resource) then I believe your system will not beconstraining job memory at all. Jobs can oversubscribe memory asmany times over as there are cores, and any job would be able to runthe machine out of memory by using more than is available. With thissetting, I guess you could say you don't have to manage reservingmemory for the OS and slurmd, but only in the sense that any jobcould consume all the memory and cause the system OOM killer to killa random process (including slurmd or something else system critical).
Cheers,

Josh.


--
Dr. Joshua C. Randall
Director of Software Engineering, HPC
Altos Labs
email: jrand...@altoslabs.com <mailto:jrand...@altoslabs.com>
On Mon, May 12, 2025 at 10:27 AM Xaver Stiensmeier via slurm-users<slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>> wrote:
    Dear Slurm-User List,

    currently, in our slurm.conf, we are setting:

        SelectType: select/cons_tres
        SelectTypeParameters: CR_Core

    and in our node configuration /RealMemory /was basically reduced
    by an amount to make sure the node always had enough RAM to run
    the OS. However, this is apparently now how it is supposed to be
    done:

        Lowering RealMemory with the goal of setting aside some
        amount for the OS and not available for job allocations will
        not work as intended if Memory is not set as a consumable
        resource in *SelectTypeParameters*. So one of the *_Memory
        options need to be enabled for that goal to be accomplished.
        (https://slurm.schedmd.com/slurm.conf.html#OPT_RealMemory
        
<https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_slurm.conf.html-23OPT-5FRealMemory&d=DwMDaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=uWfwtSoKOUql_gHnagRQ1iXIplN-ab-SPDXMtxzWL_xGgebUr9rz7ctqSRwrDk6E&s=4OAZTnfOG07jihvHaqYdDTipX3YbDZrDtvb1UzGmFcg&e=>)

    This leads to four questions regarding holding back RAM for
    worker nodes. Answers/help with any of those questions would be
    appreciated.

        *1.* Is reserving enough RAM for the worker node's OS and
        slurmd actually a thing you have to manage?
        *2.* If so how can we reserve enough RAM for the worker
        node's OS and slurmd when using CR_Core?
        *3.* Is that maybe a strong argument against using CR_Core
        that we overlooked?

    And semi-related:
    https://slurm.schedmd.com/slurm.conf.html#OPT_RealMemory
    
<https://urldefense.proofpoint.com/v2/url?u=https-3A__slurm.schedmd.com_slurm.conf.html-23OPT-5FRealMemory&d=DwMDaQ&c=WO-RGvefibhHBZq3fL85hQ&r=VdVezmCbZuLlhdKBk1emX2rlpWZ2DrL3v-wR0vX7eA4&m=uWfwtSoKOUql_gHnagRQ1iXIplN-ab-SPDXMtxzWL_xGgebUr9rz7ctqSRwrDk6E&s=4OAZTnfOG07jihvHaqYdDTipX3YbDZrDtvb1UzGmFcg&e=>
    talks about taking a value in megabytes.

        *4.* Is RealMemory really expecting megabytes or is it
        mebibytes?

    Best regards,
    Xaver
--slurm-users mailing list -- slurm-users@lists.schedmd.com
    <mailto:slurm-users@lists.schedmd.com>
    To unsubscribe send an email to
    slurm-users-le...@lists.schedmd.com
    <mailto:slurm-users-le...@lists.schedmd.com>


Altos Labs UK Limited | England | Company reg 13484917
Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire,United Kingdom, WA14 2DT

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Do I have to hold back RAM for worker nodes?

Reply via email to