Hi,
IMHO, this PAM solution is a very neat solution. It only lacks a network
limitation (maybe just add a [traffic]shaper solution, like tc ?).
Best regards
Le 26/03/2021 à 17:30, Lohit Valleru via Beowulf a écrit :
I have just used a simple PAM script to apply cgroup rules to every
user who logs into a CentOS7 login node
Something like this:
#!/bin/sh -e
PAM_UID=$(getent passwd "${PAM_USER}" | cut -d: -f3)
if [ "${PAM_UID}" -ge 1000 ]; then
/bin/systemctl set-property "user-${PAM_UID}.slice" \
CPUQuota=100% MemoryLimit=2G
fi
This is not as sophisticated or does not change parameters depending
on dynamic load, But it does set static limits for every user as per
cgroups.
However, the above does not cover every scenario, and does not
restrict the number of threads, network load, network file system load
( NFS/GPFS/Lustre). or paging etc.
I have actually seen cases where cgroups were causing more stress
trying to limit resources such as memory for users, who happen to run
hundreds of threads and still be able to stay within the memory/cpu
limit. It so happens that Cgroup does not kill every application that
goes beyond limits, as long as the application tries to stay within
its limits.
I tried limiting the number of threads with cgroups, and it caused
issues where it kills ssh connections when threads go beyond a limit.
Also, I recently realized about how Java does not recognize cgroups
for its garbage collection, and instead assumes that all of physical
memory is available.
I do not know if Arbiter somehow resolved the above issues, and
behaves much better than simple cgroup limits, or if Redhat 8 happens
to be better.
I do want to mention that for an ideal solution - i go with Chris
Dagdigian response, that it is best to educate users and follow up
respectively.
At the same time, I do wish there was a good solution. I also thought
about cases, where i could write an ssh wrapper with bsub/qsub
interactive job command that will allow users to use compute nodes as
interactive nodes for a while, to compile/edit or submit there scripts
but this would only be easy if all the compute nodes can be directly
reachable over network, and not be restricted on a private network.
Thank you,
Lohit
On Fri, Mar 26, 2021 at 10:27 AM Prentice Bisbal via Beowulf
<beowulf@beowulf.org <mailto:beowulf@beowulf.org>> wrote:
Yes, there's a tool developed specifically for this called Arbiter
that
uses Linux cgroups to dynamically limit resources on a login node
based
on it's current load. It was developed at the University of Utah:
https://dylngg.github.io/resources/arbiterTechPaper.pdf
<https://dylngg.github.io/resources/arbiterTechPaper.pdf>
https://gitlab.chpc.utah.edu/arbiter2/arbiter2
<https://gitlab.chpc.utah.edu/arbiter2/arbiter2>
Prentice
On 3/26/21 9:56 AM, Michael Di Domenico wrote:
> does anyone have a recipe for limiting the damage people can do on
> login nodes on rhel7. i want to limit the allocatable cpu/mem per
> user to some low value. that way if someone kicks off a program but
> forgets to 'srun' it first, they get bound to a single core and
don't
> bump anyone else.
>
> i've been poking around the net, but i can't find a solution, i
don't
> understand what's being recommended, and/or i'm implementing the
> suggestions wrong. i haven't been able to get them working.
the most
> succinct answer i found is that per user cgroup controls have been
> implemented in systemd v239/240, but since rhel7 is still on v219
> that's not going to help. i also found some wonkiness that runs a
> program after a user logs in and hacks at the cgroup files directly,
> but i couldn't get that to work.
>
> supposedly you can override the user-{UID}.slice unit file and
jam in
> the cgroup restrictions, but I have hundreds of users clearly that's
> not maintainable
>
> i'm sure others have already been down this road. any suggestions?
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
<https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
<mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
<https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
--
Rémy Dernat
Chef de projet SI
IR CNRS - ISI / ISEM
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf