Re: [Beowulf] [External] head node abuse

Carl Mon, 29 Mar 2021 00:31:58 -0700

Hi,

We make use of SPLOSH, which makes the process of setting up cgroups a
little easier and automated.


https://github.com/plaguedbypenguins/splosh

Cheers,

Carl.



On Mon, 29 Mar 2021 at 18:20, Rémy Dernat <remy.der...@umontpellier.fr> wrote:
>
> Hi,
>
> IMHO, this PAM solution is a very neat solution. It only lacks a network 
> limitation (maybe just add a [traffic]shaper solution, like tc ?).
>
> Best regards
>
> Le 26/03/2021 à 17:30, Lohit Valleru via Beowulf a écrit :
>
> I have just used a simple PAM script to apply cgroup rules to every user who 
> logs into a CentOS7 login node
> Something like this:
>
> #!/bin/sh -e
>
> PAM_UID=$(getent passwd "${PAM_USER}" | cut -d: -f3)
>
> if [ "${PAM_UID}" -ge 1000 ]; then
>     /bin/systemctl set-property "user-${PAM_UID}.slice" \
>                    CPUQuota=100% MemoryLimit=2G
> fi
>
> This is not as sophisticated or does not change parameters depending on 
> dynamic load, But it does set static limits for every user as per cgroups.
>
> However, the above does not cover every scenario, and does not restrict the 
> number of threads, network load, network file system load ( NFS/GPFS/Lustre). 
> or paging etc.
> I have actually seen cases where cgroups were causing more stress trying to 
> limit resources such as memory for users, who happen to run hundreds of 
> threads and still be able to stay within the memory/cpu limit. It so happens 
> that Cgroup does not kill every application that goes beyond limits, as long 
> as the application tries to stay within its limits.
> I tried limiting the number of threads with cgroups, and it caused issues 
> where it kills ssh connections when threads go beyond a limit.
> Also, I recently realized about how Java does not recognize cgroups for its 
> garbage collection, and instead assumes that all of physical memory is 
> available.
>
> I do not know if Arbiter somehow resolved the above issues, and behaves much 
> better than simple cgroup limits, or if Redhat 8 happens to be better.
>
> I do want to mention that for an ideal solution - i go with Chris Dagdigian 
> response, that it is best to educate users and follow up respectively.
>
> At the same time, I do wish there was a good solution. I also thought about 
> cases, where i could write an ssh wrapper with bsub/qsub interactive job 
> command that will allow users to use compute nodes as interactive nodes for a 
> while, to compile/edit or submit there scripts but this would only be easy if 
> all the compute nodes can be directly reachable over network, and not be 
> restricted on a private network.
>
> Thank you,
> Lohit
>
> On Fri, Mar 26, 2021 at 10:27 AM Prentice Bisbal via Beowulf 
> <beowulf@beowulf.org> wrote:
>>
>> Yes, there's a tool developed specifically for this called Arbiter that
>> uses Linux cgroups to dynamically limit resources on a login node based
>> on it's current load. It was developed at the University of Utah:
>>
>> https://dylngg.github.io/resources/arbiterTechPaper.pdf
>>
>> https://gitlab.chpc.utah.edu/arbiter2/arbiter2
>>
>> Prentice
>>
>> On 3/26/21 9:56 AM, Michael Di Domenico wrote:
>> > does anyone have a recipe for limiting the damage people can do on
>> > login nodes on rhel7.  i want to limit the allocatable cpu/mem per
>> > user to some low value.  that way if someone kicks off a program but
>> > forgets to 'srun' it first, they get bound to a single core and don't
>> > bump anyone else.
>> >
>> > i've been poking around the net, but i can't find a solution, i don't
>> > understand what's being recommended, and/or i'm implementing the
>> > suggestions wrong.  i haven't been able to get them working.  the most
>> > succinct answer i found is that per user cgroup controls have been
>> > implemented in systemd v239/240, but since rhel7 is still on v219
>> > that's not going to help.  i also found some wonkiness that runs a
>> > program after a user logs in and hacks at the cgroup files directly,
>> > but i couldn't get that to work.
>> >
>> > supposedly you can override the user-{UID}.slice unit file and jam in
>> > the cgroup restrictions, but I have hundreds of users clearly that's
>> > not maintainable
>> >
>> > i'm sure others have already been down this road.  any suggestions?
>> > _______________________________________________
>> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> > To change your subscription (digest mode or unsubscribe) visit 
>> > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>> _______________________________________________
>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit 
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
> --
> Rémy Dernat
> Chef de projet SI
> IR CNRS - ISI / ISEM
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [External] head node abuse

Reply via email to