On 11/01/2019 08.29, Sergey Koposov wrote:
> Hi,
>
> I've recently migrated to slurm from pbs on our cluster. Because of that, now 
> the job memory limits are
> strictly enforced and that causes my code to get killed.
> The trick is that my code uses memory mapping (i.e. mmap) of one single large 
> file (~12 Gb) in each thread on each node.
> With this technique in the past despite the fact the file is (read-only) 
> mmaped in say 16 threads, the actual memory footprint was still ~ 12 Gb.
> However, when I now do this in slurm, it thinks that each thread (or process) 
> takes 12Gb and kills my processes.
>
> Does anyone has a way around this problem ? Other then stoping using Memory 
> as a consumable resource, or faking that each node has more memory ?
>
> Here is an example slurm script that I'm running
> #!/bin/bash
> #SBATCH -N 1 # number of nodes
> #SBATCH --cpus-per-task=10 # number of cores
> #SBATCH --ntasks-per-node=1
> #SBATCH --mem=125GB
> #SBATCH --array=0-4
>
> sh script1.sh $SLURM_ARRAY_TASK_ID 5
>
> The script1 essentially starts python which in turn create 10 multiprocessing 
> processes each of which will mmap the large file.
> ------
> In this case I'm forced to limit myself to using only 10 threads, instead of 
> 16 (our machines have 16 cores) to avoid being killed by slurm.
> ---
> Thanks in advance for any suggestions.
>          
>             Sergey
>
What is your memory limit configuration in slurm? Anyway, a few things to check:

- Make sure you're not limiting RLIMIT_AS in any way (e.g. run "ulimit -v" in 
your batch script, ensure it's unlimited. In the slurm config, ensure 
VSizeFactor=0).
- Are you using task/cgroup for limiting memory? In that case the problem might 
be that cgroup memory limits work with RSS, and as you're running multiple 
processes the shared mmap'ed file will be counted multiple times. There's no 
really good way around this, but with, say, something like

ConstrainRAMSpace=no
ConstrainSwapSpace=yes
AllowedRAMSpace=100
AllowedSwapSpace=1600
you'll get a setup where the cgroup soft limit will be set to the amount your 
job allocates, but the hard limit (where the job will be killed) will be set to 
1600% of that.
- If you're using cgroups for memory limits, you should also set 
JobAcctGatherParams=NoOverMemoryKill
- If you're NOT using cgroups for memory limits, try setting 
JobAcctGatherParams=UsePSS which should avoiding counting the shared mappings 
multiple times.

-- 
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || janne.blomqv...@aalto.fi


Reply via email to