What I said in my last e-mail (which you probably haven't gotten to yet)
is similar to this case. On it's own Slurm wouldn't propagate resource
limits, but that as been added as a function. In your case, Slurm has
functionality built into it where you can tell it to use PAM. With this
functionality built into Slurm and enabled like you have done, Slurm
would bypass PAM.
This is similar to SSH, where you can enable the UsePAM feature.
My reading of the documentation for PropagateResourceLimits, I think
Slurm looks at the limits in the actual environment when the job is
submitted, not in /etc/security/limits.conf via PAM. In my previous
e-mail, I provided a method to test this, but haven't tested this
myself. Yet.
Prentice
On 4/29/21 12:54 PM, Ryan Novosielski wrote:
It may not for specifically PropagateResourceLimits – as I said, the docs are a
little sparse on the “how” this actually works – but you’re not correct that
PAM doesn’t come into play re: user jobs. If you have “UsePam = 1” set, and
have an /etc/pam.d/slurm, as our site does, there is some amount of interaction
here, and PAM definitely affects user jobs.
On Apr 27, 2021, at 11:31 AM, Prentice Bisbal <pbis...@pppl.gov> wrote:
I don't think PAM comes into play here. Since Slurm is starting the processes
on the compute nodes as the user, etc., PAM is being bypassed.
Prentice
On 4/22/21 10:55 AM, Ryan Novosielski wrote:
My recollection is that this parameter is talking about “ulimit” parameters,
and doesn’t have to do with cgroups. The documentation is not as clear here as
it could be, about what this does, the mechanism by which it’s applied (PAM
module), etc.
Sent from my iPhone
On Apr 22, 2021, at 09:07, Diego Zuccato <diego.zucc...@unibo.it> wrote:
Hello all.
I'd need a clarification about PropagateResourceLimits.
If I set it to NONE, will cgroup still limit the resources a job can use on the
worker node(s), actually decoupling limits on the frontend from limits on the
worker nodes?
I've been bitten by the default being ALL, so when I tried to limit to 1GB soft
/ 4GB hard the memory users can use on the frontend, the jobs began to fail at
startup even if they requested 200G (that are available on the worker nodes but
not on the frontend)...
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786