[slurm-users] Socket memory binding (NUMA). cpu-bind possible with srun, not sbatch

Rémy Dernat via slurm-users Wed, 14 Jan 2026 08:39:57 -0800

Hi,

I am looking at that useful documentationhttps://slurm.schedmd.com/cpu_management.html

We have people complaining about memory performance issues while runninghighly distributed jobs in a shared HPC cluster environment. Afterlooking into it, we saw that is either because of concurrent access tothe memory with other users, or because we have jump over memory from asocket to another (in a Numa architecture).

Until now, our "easy" answer was to use "exclusive" mode in sbatch, toensure avoiding any concurrent memory access, and using numactl to checkif the job is bound to the other socket.

However, I am looking to a better solution, as exclusive mode took allof a big node (we have only big nodes...), and numactl just check if wewill have poor performances or not.

I checked the slurm.conf parameters, but all I saw is about modifyingselectType. However, I cannot use other selectType as we have GPU, so weare using cons_tres. Moreover, we already have a very complicated Slurmconfiguration, and I would like to avoid any side effect.

I am thinking about talking to our user about the `--cpu-bind`(https://slurm.schedmd.com/srun.html#OPT_cpu-bind) method, but I am notsure how to use it, and it seems to be limited to srun ... ?

From a job point of view, I know there are some programs to deal withthat, like likwid or placement (but not entirely ?), but that does notseem easy to use, and a slurm would be more suitable and more generic.


Do you have any idea on how to deal with these issues ?

Thanks,

Best regards,

Rémy Dernat

-- 
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[slurm-users] Socket memory binding (NUMA). cpu-bind possible with srun, not sbatch

Reply via email to