Hi Rodrigo,
we do pretty much what you do - constrain via cgroups - and it works
fine. So I know it's possible. (I don't think I've ever twiddled the
VSizeFactor.)
I think you also need
SelectType=select/cons_res (or cons_tres)
SelectTypeParameters=CR_Core_Memory
in your slurm.conf; have you got that?
My cgroup.conf is this:
CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
TaskAffinity=no
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
ConstrainDevices=yes
AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf"
AllowedRamSpace=100
AllowedSwapSpace=0
MaxRAMPercent=100
MaxSwapPercent=0
MinRAMSpace=30
Tina
On 05/10/2021 09:15, Hermann Schwärzler wrote:
Hi Rodrigo,
a possible solution is using
VSizeFactor=100
in slurm.conf.
With this settings, programs that try to allocate more memory than
requested in the job's settings will fail.
Be aware that this puts a limit on *virtual* memory, not on RSS. This
might or might not be what you want as a lot of programs tend to
allocate (a lot) more virtual memory than they really use (RSS).
Regards,
Hermann
On 10/5/21 12:46 AM, Rodrigo Santibáñez wrote:
Hello Slurm Users,
I'm having a hard time configuring slurm to kill jobs when they use
more memory than requested. Also, I can't make jobs use only RAM, and
some of them starts to use SWAP.
I don't know what I'm missing.
Thanks for your help
slurmd -V
slurm 20.02.6
slurm.conf
TaskPlugin=task/affinity,task/cgroup
ProctrackType=proctrack/cgroup
cgroup.conf
AllowedRAMSpace=100.0
AllowedSwapSpace=0.0
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
MemorySwappiness=0
CgroupAutomount=yes
ConstrainCores=yes
--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk