Hi,

When deploying slurm and having some trouble to start slurmd on nodes, I found an interesting command to check the memory size  seen by slurm on a compute node:

sudo slurmd -C

This could be helpful. Then I have set the memory size of the node a little be lower to avoid running out-of-memory, specifically when a user allocate the full node.

Patrick


Le 12/05/2025 à 14:55, Xaver Stiensmeier via slurm-users a écrit :

Josh,

thank you for your thorough answer. I, too, considered switching to CR_Core_Memory after reading into this. Thank you for confirming my suspicion that without Memory, we cannot handle high memory requests adequately.

If I may ask: *How do you come up with the specific MemSpecLimit?* Do you handpick a value for each node, have you picked a constant value for all nodes or do you take a capped percentage of the maximum memory available?

Best regards, Xaver

On 5/12/25 14:43, Joshua Randall wrote:
Xaver,

It is my understanding that if we want to have stable systems that don't run out of memory, we do need to manage the amount of memory needed for everything not running within a slurm job, yes.

In our cluster, we are using `CR_Core_Memory` (so we do constrain our job memory) and we set the `RealMemory` to the actual full amount of memory available on the machine - I believe these really are given in megabytes (MB), not mebibytes (MiB). I think their example of (e.g. "2048") is intended to convey this because 2000 MiB is 2048 MB. We set the `MemSpecLimit` for each node to set memory aside for everything in the system that is not running within a slurm job. This include the slurm daemon itself, the kernel, filesystem drivers, metrics collection agents, etc -- anything else we are running outside the control of slurm jobs. The `MemSpecLimit` just sets aside the specified amount and the result will be that the maximum memory jobs can use on the node is (RealMemory - MemSpecLimit). When using cgroups to limit memory, slurmd will also be allocated the specified limit so that the slurm daemon cannot encroach on job memory. However, note that `MemSpecLimit` is documented to not work unless your `SelectTypeParameters` includes Memory as a consumable resource.

Since you are using `CR_Core` (which does not configure Memory as a consumable resource) then I believe your system will not be constraining job memory at all. Jobs can oversubscribe memory as many times over as there are cores, and any job would be able to run the machine out of memory by using more than is available. With this setting, I guess you could say you don't have to manage reserving memory for the OS and slurmd, but only in the sense that any job could consume all the memory and cause the system OOM killer to kill a random process (including slurmd or something else system critical).

Cheers,

Josh.


--
Dr. Joshua C. Randall
Director of Software Engineering, HPC
Altos Labs
email: jrand...@altoslabs.com



On Mon, May 12, 2025 at 10:27 AM Xaver Stiensmeier via slurm-users <slurm-users@lists.schedmd.com> wrote:

    Dear Slurm-User List,

    currently, in our slurm.conf, we are setting:

        SelectType: select/cons_tres
        SelectTypeParameters: CR_Core

    and in our node configuration /RealMemory /was basically reduced
    by an amount to make sure the node always had enough RAM to run
    the OS. However, this is apparently now how it is supposed to be
    done:

        Lowering RealMemory with the goal of setting aside some
        amount for the OS and not available for job allocations will
        not work as intended if Memory is not set as a consumable
        resource in *SelectTypeParameters*. So one of the *_Memory
        options need to be enabled for that goal to be accomplished.
        (https://slurm.schedmd.com/slurm.conf.html#OPT_RealMemory)

    This leads to four questions regarding holding back RAM for
    worker nodes. Answers/help with any of those questions would be
    appreciated.

        *1.* Is reserving enough RAM for the worker node's OS and
        slurmd actually a thing you have to manage? *2.* If so how
        can we reserve enough RAM for the worker node's OS and slurmd
        when using CR_Core? *3.* Is that maybe a strong argument
        against using CR_Core that we overlooked?

    And semi-related:
    https://slurm.schedmd.com/slurm.conf.html#OPT_RealMemory talks
    about taking a value in megabytes.

        *4.* Is RealMemory really expecting megabytes or is it mebibytes?

    Best regards, Xaver


-- slurm-users mailing list -- slurm-users@lists.schedmd.com
    To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


Altos Labs UK Limited | England | Company reg 13484917
Registered address: 3rd Floor 1 Ashley Road, Altrincham, Cheshire, United Kingdom, WA14 2DT
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to