What I said in my last e-mail (which you probably haven't gotten to yet) is similar to this case. On it's own Slurm wouldn't propagate resource limits, but that as been added as a function. In your case, Slurm has functionality built into it where you can tell it to use PAM. With this functionality built into Slurm and enabled like you have done, Slurm would bypass PAM.

This is similar to SSH, where you can enable the UsePAM feature.

My reading of the documentation for PropagateResourceLimits, I think Slurm looks at the limits in the actual environment when the job is submitted, not in /etc/security/limits.conf via PAM. In my previous e-mail, I provided a method to test this, but haven't tested this myself. Yet.


Prentice

On 4/29/21 12:54 PM, Ryan Novosielski wrote:
It may not for specifically PropagateResourceLimits – as I said, the docs are a 
little sparse on the “how” this actually works – but you’re not correct that 
PAM doesn’t come into play re: user jobs. If you have “UsePam = 1” set, and 
have an /etc/pam.d/slurm, as our site does, there is some amount of interaction 
here, and PAM definitely affects user jobs.

On Apr 27, 2021, at 11:31 AM, Prentice Bisbal <pbis...@pppl.gov> wrote:

I don't think PAM comes into play here. Since Slurm is starting the processes 
on the compute nodes as the user, etc., PAM is being bypassed.

Prentice


On 4/22/21 10:55 AM, Ryan Novosielski wrote:
My recollection is that this parameter is talking about “ulimit” parameters, 
and doesn’t have to do with cgroups. The documentation is not as clear here as 
it could be, about what this does, the mechanism by which it’s applied (PAM 
module), etc.

Sent from my iPhone

On Apr 22, 2021, at 09:07, Diego Zuccato <diego.zucc...@unibo.it> wrote:

Hello all.

I'd need a clarification about PropagateResourceLimits.
If I set it to NONE, will cgroup still limit the resources a job can use on the 
worker node(s), actually decoupling limits on the frontend from limits on the 
worker nodes?

I've been bitten by the default being ALL, so when I tried to limit to 1GB soft 
/ 4GB hard the memory users can use on the frontend, the jobs began to fail at 
startup even if they requested 200G (that are available on the worker nodes but 
not on the frontend)...

Tks.

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786


Reply via email to