On 18/09/2014 16:48, James wrote:
Hello,
Out Of Memory seems to invoke mysterious processes that kill
such offending processes. OOM seems to be a common problem
that pops up over and over again within the clustering communities.
I would greatly appreciate (gentoo) illuminations on the OOM issues;
both historically and for folks using/testing systemd. Not a flame_a_thon,
just some technical information, as I need to understand these
issues more deeply, how to find, measure and configure around OOM issues,
in my quest for gentoo clustering.
The need for the OOM killer stems from the fact that memory can be
overcommitted. These articles may prove informative:
http://lwn.net/Articles/317814/
http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html
In my case, the most likely trigger - as rare as it is - would be a
runaway process that consumes more than its fair share of RAM.
Therefore, I make a point of adjusting the score of production-critical
applications to ensure that they are less likely to be culled.
If your cases are not pathological, you could increase the amount of
memory, be it by additional RAM or additional swap [1]. Alternatively,
if you are able to precisely control the way in which memory is
allocated and can guarantee that it will not be exhausted, you may elect
to disable overcommit, though I would not recommend it.
With NUMA, things may be more complicated because there is the potential
for a particular memory node to be exhausted, unless memory interleaving
is employed. Indeed, I make a point of using interleaving for MySQL,
having gotten the idea from the Twitter fork.
Finally, make sure you are using at least Linux 3.12, because some
improvements have been made there [2].
--Kerin
[1] At a pinch, additional swap may be allocated as a file
[2] https://lwn.net/Articles/562211/#oom