Re: [slurm-users] [External] Re: PropagateResourceLimits

Prentice Bisbal Thu, 29 Apr 2021 10:42:50 -0700

So I decided to eat my own dog food, and tested this out myself. Firstof all, running ulimit through srun "naked" like that doesn't work,since ulimit is a bash shell builtin, so I had to write a simple shellscript:


$ cat ulimit.sh


#!/bin/bash

ulimit -a

By default, core is set to zero in our environment as a good securitypractice and to keep user's core dumps from filling up the filesystem.My default ulimit settings:


$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128054
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Now I run my ulimit.sh script through srun

$ srun -N1 -n1 -t 00:01:00 --mem=1G ./ulimit.sh
srun: job 1249977 queued and waiting for resources
srun: job 1249977 has been allocated resources
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 257092
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) 1048576
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Now I set core size:

$ ulimit -c 1024
(base) [pbisbal@sunfire01 ulimit]$ ulimit -c
1024

And run ulimit.sh through srun again:

$ srun -N1 -n1 -t 00:01:00 --mem=1G ./ulimit.sh
srun: job 1249978 queued and waiting for resources
srun: job 1249978 has been allocated resources
core file size          (blocks, -c) 1024
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 257092
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) 1048576
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

This confirms that PropagateResourceLimits comes from the user'senvironment, not PAM. If you have UsePAM enabled as Ryan suggested in aprevious e-mail, that puts *upper limits* on the values propagated byPropagateResourceLimits. According to the slurm.conf man age, it doesn'tnecessarily override the limits set in the environment when the job issubmitted:

 UsePAM If set to 1, PAM (Pluggable Authentication  Modules  for  Linux)
will be enabled. PAM is used to establish the upperbounds for resource limits. With PAM support enabled, local systemadminis‐ trators can dynamically configure system resourcelimits. Chang‐ ing the upper bound of a resource limit will not alterthe lim‐ its of running jobs, only jobs started after a changehas been made will pick up the new limits. The default value is 0 (not
              to enable PAM support)....

So if I set core file size to 0 and /etc/security/limits.conf sets it to1024, if UsePAM=1 and PropagateResourceLimits=ALL (the default for thatsetting), core file size will stay 0. If I set it to 2048 and UsePAM=1,then Slurm will reduce that limit to 1024.

Note that setting UsePAM=1 alone isn't enough. You need to configure aPAM module named slurm, too, as Ryan pointed out.


Prentice

On 4/29/21 12:35 PM, Prentice Bisbal wrote:

On 4/28/21 2:26 AM, Diego Zuccato wrote:
Il 27/04/2021 17:31, Prentice Bisbal ha scritto:
I don't think PAM comes into play here. Since Slurm is starting theprocesses on the compute nodes as the user, etc., PAM is beingbypassed.
Then maybe slurmd somehow goes throught the PAM stack another way,since limits on the frontend got propagated (as implied byPropagateResourceLimits default value of ALL).And I can confirm that setting it to NONE seems to have solved theissue: users on the frontend get limited resources, and jobs on thenodes get the resources they asked.
In this case, Slurm is deliberately looking at the resource limitseffect when the job is submitted on the submission host, and thencopying them the to job's environment. From the slurm.conf documentation (https://slurm.schedmd.com/slurm.conf.html):
*PropagateResourceLimits*
    A comma-separated list of resource limit names. The slurmd daemon
    uses these names to obtain the associated (soft) limit values
    from the user's process environment on the submit node. These
    limits are then propagated and applied to the jobs that will run
    on the compute nodes.'
Then later on, it indicates that all resource limits are propagated bydefault:
The following limit names are supported by Slurm (although someoptions may not be supported on some systems):
*ALL*
    All limits listed below (default)
You should be able to verify this yourself in the following manner:

1. Start two separate shells on the submission host
2. Change the limits in one of the shells. For example, reduce coresize to 0, with 'ulimit -c 0' in just one shell.
3. Then run 'srun ulimit -a' from each shell.
4. Compare the output. The one shell should show that core size is nowzero.
--

Prentice

Re: [slurm-users] [External] Re: PropagateResourceLimits

Reply via email to