Hi, We make use of SPLOSH, which makes the process of setting up cgroups a little easier and automated.
https://github.com/plaguedbypenguins/splosh Cheers, Carl. On Mon, 29 Mar 2021 at 18:20, Rémy Dernat <remy.der...@umontpellier.fr> wrote: > > Hi, > > IMHO, this PAM solution is a very neat solution. It only lacks a network > limitation (maybe just add a [traffic]shaper solution, like tc ?). > > Best regards > > Le 26/03/2021 à 17:30, Lohit Valleru via Beowulf a écrit : > > I have just used a simple PAM script to apply cgroup rules to every user who > logs into a CentOS7 login node > Something like this: > > #!/bin/sh -e > > PAM_UID=$(getent passwd "${PAM_USER}" | cut -d: -f3) > > if [ "${PAM_UID}" -ge 1000 ]; then > /bin/systemctl set-property "user-${PAM_UID}.slice" \ > CPUQuota=100% MemoryLimit=2G > fi > > This is not as sophisticated or does not change parameters depending on > dynamic load, But it does set static limits for every user as per cgroups. > > However, the above does not cover every scenario, and does not restrict the > number of threads, network load, network file system load ( NFS/GPFS/Lustre). > or paging etc. > I have actually seen cases where cgroups were causing more stress trying to > limit resources such as memory for users, who happen to run hundreds of > threads and still be able to stay within the memory/cpu limit. It so happens > that Cgroup does not kill every application that goes beyond limits, as long > as the application tries to stay within its limits. > I tried limiting the number of threads with cgroups, and it caused issues > where it kills ssh connections when threads go beyond a limit. > Also, I recently realized about how Java does not recognize cgroups for its > garbage collection, and instead assumes that all of physical memory is > available. > > I do not know if Arbiter somehow resolved the above issues, and behaves much > better than simple cgroup limits, or if Redhat 8 happens to be better. > > I do want to mention that for an ideal solution - i go with Chris Dagdigian > response, that it is best to educate users and follow up respectively. > > At the same time, I do wish there was a good solution. I also thought about > cases, where i could write an ssh wrapper with bsub/qsub interactive job > command that will allow users to use compute nodes as interactive nodes for a > while, to compile/edit or submit there scripts but this would only be easy if > all the compute nodes can be directly reachable over network, and not be > restricted on a private network. > > Thank you, > Lohit > > On Fri, Mar 26, 2021 at 10:27 AM Prentice Bisbal via Beowulf > <beowulf@beowulf.org> wrote: >> >> Yes, there's a tool developed specifically for this called Arbiter that >> uses Linux cgroups to dynamically limit resources on a login node based >> on it's current load. It was developed at the University of Utah: >> >> https://dylngg.github.io/resources/arbiterTechPaper.pdf >> >> https://gitlab.chpc.utah.edu/arbiter2/arbiter2 >> >> Prentice >> >> On 3/26/21 9:56 AM, Michael Di Domenico wrote: >> > does anyone have a recipe for limiting the damage people can do on >> > login nodes on rhel7. i want to limit the allocatable cpu/mem per >> > user to some low value. that way if someone kicks off a program but >> > forgets to 'srun' it first, they get bound to a single core and don't >> > bump anyone else. >> > >> > i've been poking around the net, but i can't find a solution, i don't >> > understand what's being recommended, and/or i'm implementing the >> > suggestions wrong. i haven't been able to get them working. the most >> > succinct answer i found is that per user cgroup controls have been >> > implemented in systemd v239/240, but since rhel7 is still on v219 >> > that's not going to help. i also found some wonkiness that runs a >> > program after a user logs in and hacks at the cgroup files directly, >> > but i couldn't get that to work. >> > >> > supposedly you can override the user-{UID}.slice unit file and jam in >> > the cgroup restrictions, but I have hundreds of users clearly that's >> > not maintainable >> > >> > i'm sure others have already been down this road. any suggestions? >> > _______________________________________________ >> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> > To change your subscription (digest mode or unsubscribe) visit >> > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > -- > Rémy Dernat > Chef de projet SI > IR CNRS - ISI / ISEM > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf