Hi Rodrigo,

we do pretty much what you do - constrain via cgroups - and it works fine. So I know it's possible. (I don't think I've ever twiddled the VSizeFactor.)

I think you also need

SelectType=select/cons_res (or cons_tres)
SelectTypeParameters=CR_Core_Memory

in your slurm.conf; have you got that?

My cgroup.conf is this:

CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
TaskAffinity=no
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
ConstrainDevices=yes
AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf"
AllowedRamSpace=100
AllowedSwapSpace=0
MaxRAMPercent=100
MaxSwapPercent=0
MinRAMSpace=30

Tina

On 05/10/2021 09:15, Hermann Schwärzler wrote:
Hi Rodrigo,

a possible solution is using

VSizeFactor=100

in slurm.conf.

With this settings, programs that try to allocate more memory than requested in the job's settings will fail.

Be aware that this puts a limit on *virtual* memory, not on RSS. This might or might not be what you want as a lot of programs tend to allocate (a lot) more virtual memory than they really use (RSS).

Regards,
Hermann

On 10/5/21 12:46 AM, Rodrigo Santibáñez wrote:
Hello Slurm Users,

I'm having a hard time configuring slurm to kill jobs when they use more memory than requested. Also, I can't make jobs use only RAM, and some of them starts to use SWAP.

I don't know what I'm missing.

Thanks for your help

slurmd -V
slurm 20.02.6

slurm.conf
TaskPlugin=task/affinity,task/cgroup
ProctrackType=proctrack/cgroup

cgroup.conf
AllowedRAMSpace=100.0
AllowedSwapSpace=0.0
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
MemorySwappiness=0
CgroupAutomount=yes
ConstrainCores=yes


--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator

Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Reply via email to