Hello, 

You might want to use the node_reg_mem_percent parameter ( [ 
https://slurm.schedmd.com/slurm.conf.html#OPT_node_reg_mem_percent | 
https://slurm.schedmd.com/slurm.conf.html#OPT_node_reg_mem_percent ] ). For 
example, if set to 80, it will allow a node to work even if it has only 80% of 
the declared memory. 

Guillaume 


De: "Xaver Stiensmeier via slurm-users" <[email protected]> 
À: [email protected] 
Envoyé: Jeudi 14 Août 2025 10:01:26 
Objet: [slurm-users] Nodes Become Invalid Due to Less Total RAM Than Expected 



Dear slurm-user list, 

in the past we had a bigger buffer between [ 
https://slurm.schedmd.com/slurm.conf.html#OPT_RealMemory | RealMemory ] and the 
instance memory. We then discovered that the right way is to activating the 
memory option (SelectTypeParameters=CR_Core_Memory) and setting [ 
https://slurm.schedmd.com/slurm.conf.html#OPT_MemSpecLimit | MemSpecLimit ] to 
secure RAM for system processes. 

However, now we run into the problem that due to on demand scheduling , we have 
to setup the slurm.conf in advance by using the RAM values from our flavors as 
reported by our cloud provider (OpenStack). These RAM values are higher than 
the RAM values the machines actually have later on: 

ram_in_mib by openstack 
        total_ram_in_mib by top/slurm 

2048    1968 
16384   15991 
32768   32093 
65536   64297 
122880  120749 
245760  241608 
491520  483528 



Given that we have to define the slurm.conf in advance, we kinda have to 
predict how much total ram the instances have once created. Of course I used 
linear regression to approximate the total ram and then lowered it a bit to 
have some cushion, but this feels unsafe given that future flavors could differ 
from that. 

>From the [ https://www.kernel.org/doc/Documentation/filesystems/proc.txt | 
>kernel
        documentation ] I know that MemTotal is 



MemTotal: Total usable ram (i.e. physical ram minus a few reserved bits and the 
kernel binary code) 



but given that the concrete reserved bits are [ 
https://witekio.com/blog/cat-proc-meminfo-memtotal/ | quite
        complex ] , I am wondering whether I am doing something wrong as this 
issue doesn't feel niche enough to be that complicated. 

--- 

Anyway, setting the RAM value in the slurm.conf above total ram by predicting 
too much, leads to errors and nodes being marked as invalid: 
BQ_BEGIN


[2025-08-11T08:19:04.736] debug: Node NODE_NAME has low real_memory size 
(241607 / 245760) < 100.00% 
[2025-08-11T08:19:04.736] error: _slurm_rpc_node_registration node=NODE_NAME: 
Invalid argument 
BQ_END


or 
BQ_BEGIN


[2025-07-03T12:57:18.486] error: Setting node NODE_NAME state to INVAL with 
reason:Low RealMemory (reported:64295 < 100.00% of configured:68719) 
BQ_END


Any hint on how to solve this is much appreciated! 
Best regards, 
Xaver 



-- 
slurm-users mailing list -- [email protected] 
To unsubscribe send an email to [email protected] 
-- 
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to