Re: [slurm-users] slurmstepd: error: Exceeded job memory limit at some point.

2018-02-15 Thread Williams, Jenny Avis
Here we see this. There is a difference in behavior depending whether the program runs out of the "standard" NFS or the GPFS filesystem. If the I/O is from NFS, there can be conditions where we see this with some frequency on a given problem. It will not be every time but can be reproduced.

[slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread Petersen, Dirk
I think cgroups is prob more elegant .. but here is another script https://github.com/FredHutch/IT/blob/master/py/loadwatcher.py#L59 The email text is hard coded so please change before using. We put this in place in Oct 2017 when things where getting out of control because folks were

Re: [slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread Nicholas McCollum
I had previously contacted Ryan Cox about his solution and worked with it a little to implement it on our CentOS 7 cluster. While I liked his solution, I felt it was a little complex for our needs. I'm a big fan of keeping stuff real simple, so I came up with two simple shell scripts to solve

Re: [slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread Ryan Cox
Manuel, We set up cgroups and also do cputime limits (60 minutes in our case) in limits.conf.  Before libcgroup had support for more generic "apply to each user" kind of thing, I created a pam module that handles all of that which still works well for creating per-user limits.  We also have s

Re: [slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread Michael Jennings
On Thursday, 15 February 2018, at 16:11:29 (+0100), Manuel Rodríguez Pascual wrote: > Although this is not strictly related to Slurm, maybe you can recommend me > some actions to deal with a particular user. > > On our small cluster, currently there are no limits to run applications in > the fron

Re: [slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread Loris Bennett
Hi Manuel, Manuel Rodríguez Pascual writes: > Hi all, > > Although this is not strictly related to Slurm, maybe you can > recommend me some actions to deal with a particular user. > > On our small cluster, currently there are no limits to run > applications in the frontend. This is sometimes re

Re: [slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread John Hanks
I've used this with some success: https://github.com/JohannesBuchner/verynice. For CPU intensive things it works great, but you have to also set some memory limits in limits.conf if users do any large memory stuff. Otherwise I just use a problem process as a chance to start a conversation with that

Re: [slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread Pablo Escobar
Hi Manuel, A possible workaround is to configure a cgroups limit by user in the frontend node so a single user cannot allocate more than 1GB of ram (or whatever value you prefer). The user would still be able to abuse the machine but as soon as his memory usage goes above the limit his job will be

Re: [slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread Jeffrey Frey
Every cluster I've ever managed has this issue. Once cgroup support arrived in Linux, the path we took (on CentOS 6) was to use the 'cgconfig' and 'cgred' services on the login node(s) to setup containers for regular users and quarantine them therein. The config left 4 CPU cores unused by regu

Re: [slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread Bill Barth
We kick them off and lock them out until they respond. Disconnections are common enough that it doesn’t always get their attention. Inability to log back in always does. Best, Bill. Sent from my phone. > On Feb 15, 2018, at 9:25 AM, Patrick Goetz wrote: > > The simple solution is to tell p

Re: [slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread Paul Edmon
We have an automated script, pcull which goes through and finds abusing processes: https://github.com/fasrc/pcull -Paul Edmon- On 02/15/2018 10:25 AM, Patrick Goetz wrote: The simple solution is to tell people not to do this -- that's what I do. And if that doesn't work threaten to kick them

Re: [slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread Patrick Goetz
The simple solution is to tell people not to do this -- that's what I do. And if that doesn't work threaten to kick them off the system. On 02/15/2018 09:11 AM, Manuel Rodríguez Pascual wrote: Hi all, Although this is not strictly related to Slurm, maybe you can recommend me some actions to d

[slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread Manuel Rodríguez Pascual
Hi all, Although this is not strictly related to Slurm, maybe you can recommend me some actions to deal with a particular user. On our small cluster, currently there are no limits to run applications in the frontend. This is sometimes really useful for some users, for example to have scripts moni