Every cluster I've ever managed has this issue.  Once cgroup support arrived in 
Linux, the path we took (on CentOS 6) was to use the 'cgconfig' and 'cgred' 
services on the login node(s) to setup containers for regular users and 
quarantine them therein.  The config left 4 CPU cores unused by regular users 
(cpuset config), and allowed them to use up to 100% of the 16 cores granted but 
yield cycles as other users demand (cpu config).  The config also keeps a minor 
amount of RAM unused by regular users, and limits each regular user to a couple 
GB.

The cgrules.conf works on first-match, so at the top we make sure root and 
sysadmins don't have any limits.  Support staff get the overall limits for 
regular users, and everyone else who's not a daemon user, etc, gets a personal 
cgroup with the most stringent limits.




/etc/cgconfig.conf:
mount {
        cpuset  = /cgroup/cpuset;
        cpu     = /cgroup/cpu;
        #cpuacct        = /cgroup/cpuacct;
        memory  = /cgroup/memory;
        #devices        = /cgroup/devices;
        #freezer        = /cgroup/freezer;
        #net_cls        = /cgroup/net_cls;
        #blkio  = /cgroup/blkio;
}

group regular_users {
  cpu {
    cpu.shares=100;
  }
  cpuset {
    cpuset.cpus=4-19;
    cpuset.mems=0-1;
  }
  memory {
    memory.limit_in_bytes=48G;
    memory.soft_limit_in_bytes=48G;
    memory.memsw.limit_in_bytes=60G;
  }
}

template regular_users/%U {
  cpu {
    cpu.shares=100;
  }
  cpuset {
    cpuset.cpus=4-19;
    cpuset.mems=0-1;
  }
  memory {
    memory.limit_in_bytes=4G;
    memory.soft_limit_in_bytes=2G;
    memory.memsw.limit_in_bytes=6G;
  }
}


/etc/cgrules.conf
#
# Include an explicit rule for root, otherwise commands with
# the setuid bit set on them will inherit the original user's
# gid and probably wind up under @everyone:
#
root            cpuset,cpu,memory       /

#
# sysadmin
#
user1           cpuset,cpu,memory       /
user2           cpuset,cpu,memory       /

#
# sysstaff
#
user3           cpuset,cpu,memory       regular_users/
user4           cpuset,cpu,memory       regular_users/

#
# workgroups:
#
@everyone               cpuset,cpu,memory               regular_users/%U/
@group1                 cpuset,cpu,memory               regular_users/%U/
@group2                 cpuset,cpu,memory               regular_users/%U/
  :






> On Feb 15, 2018, at 10:11 AM, Manuel Rodríguez Pascual 
> <manuel.rodriguez.pasc...@gmail.com> wrote:
> 
> Hi all, 
> 
> Although this is not strictly related to Slurm, maybe you can recommend me 
> some actions to deal with a particular user. 
> 
> On our small cluster, currently there are no limits to run applications in 
> the frontend. This is sometimes really useful for some users, for example to 
> have scripts monitoring the execution of jobs and taking decisions depending 
> on the partial results.
> 
> However, we have this user that keeps abusing this system: when the job queue 
> is long and there is a significant time wait, he sometimes runs his jobs on 
> the frontend, resulting on a CPU load of 100% and some delays on using it for 
> the things it is supposed to serve (user login, monitoring and so). 
> 
> Have you faced the same issue?  Is there any solution? I am thinking about 
> using ulimit to limit the execution time of this jobs in the frontend to 5 
> minutes or so. This however does not look so elegant as other users can 
> perform the sabe abuse on the future, and he should also be able to run low 
> cpu-consuming jobs for a longer period. However I am not an experienced 
> sysadmin so I am completely open to suggestions or different ways of facing 
> this issue.
> 
> Any thoughts?
> 
> cheers, 
> 
> 
> 
> 
> Manuel


::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jeffrey T. Frey, Ph.D.
Systems Programmer V / HPC Management
Network & Systems Services / College of Engineering
University of Delaware, Newark DE  19716
Office: (302) 831-6034  Mobile: (302) 419-4976
::::::::::::::::::::::::::::::::::::::::::::::::::::::




Reply via email to