On 11/01/2019 08.29, Sergey Koposov wrote: > Hi, > > I've recently migrated to slurm from pbs on our cluster. Because of that, now > the job memory limits are > strictly enforced and that causes my code to get killed. > The trick is that my code uses memory mapping (i.e. mmap) of one single large > file (~12 Gb) in each thread on each node. > With this technique in the past despite the fact the file is (read-only) > mmaped in say 16 threads, the actual memory footprint was still ~ 12 Gb. > However, when I now do this in slurm, it thinks that each thread (or process) > takes 12Gb and kills my processes. > > Does anyone has a way around this problem ? Other then stoping using Memory > as a consumable resource, or faking that each node has more memory ? > > Here is an example slurm script that I'm running > #!/bin/bash > #SBATCH -N 1 # number of nodes > #SBATCH --cpus-per-task=10 # number of cores > #SBATCH --ntasks-per-node=1 > #SBATCH --mem=125GB > #SBATCH --array=0-4 > > sh script1.sh $SLURM_ARRAY_TASK_ID 5 > > The script1 essentially starts python which in turn create 10 multiprocessing > processes each of which will mmap the large file. > ------ > In this case I'm forced to limit myself to using only 10 threads, instead of > 16 (our machines have 16 cores) to avoid being killed by slurm. > --- > Thanks in advance for any suggestions. > > Sergey > What is your memory limit configuration in slurm? Anyway, a few things to check:
- Make sure you're not limiting RLIMIT_AS in any way (e.g. run "ulimit -v" in your batch script, ensure it's unlimited. In the slurm config, ensure VSizeFactor=0). - Are you using task/cgroup for limiting memory? In that case the problem might be that cgroup memory limits work with RSS, and as you're running multiple processes the shared mmap'ed file will be counted multiple times. There's no really good way around this, but with, say, something like ConstrainRAMSpace=no ConstrainSwapSpace=yes AllowedRAMSpace=100 AllowedSwapSpace=1600 you'll get a setup where the cgroup soft limit will be set to the amount your job allocates, but the hard limit (where the job will be killed) will be set to 1600% of that. - If you're using cgroups for memory limits, you should also set JobAcctGatherParams=NoOverMemoryKill - If you're NOT using cgroups for memory limits, try setting JobAcctGatherParams=UsePSS which should avoiding counting the shared mappings multiple times. -- Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist Aalto University School of Science, PHYS & NBE +358503841576 || janne.blomqv...@aalto.fi