On 18/09/2014 16:48, James wrote:
Hello,

Out Of Memory seems to invoke mysterious processes that kill
such offending processes. OOM seems to be a common problem
that pops up over and over again within the clustering communities.


I would greatly appreciate (gentoo) illuminations on the OOM issues;
both historically and for folks using/testing systemd. Not a flame_a_thon,
just some technical information, as I need to understand these
issues more deeply, how to find, measure and configure around OOM issues,
in my quest for gentoo clustering.

The need for the OOM killer stems from the fact that memory can be overcommitted. These articles may prove informative:

http://lwn.net/Articles/317814/
http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html

In my case, the most likely trigger - as rare as it is - would be a runaway process that consumes more than its fair share of RAM. Therefore, I make a point of adjusting the score of production-critical applications to ensure that they are less likely to be culled.

If your cases are not pathological, you could increase the amount of memory, be it by additional RAM or additional swap [1]. Alternatively, if you are able to precisely control the way in which memory is allocated and can guarantee that it will not be exhausted, you may elect to disable overcommit, though I would not recommend it.

With NUMA, things may be more complicated because there is the potential for a particular memory node to be exhausted, unless memory interleaving is employed. Indeed, I make a point of using interleaving for MySQL, having gotten the idea from the Twitter fork.

Finally, make sure you are using at least Linux 3.12, because some improvements have been made there [2].

--Kerin

[1] At a pinch, additional swap may be allocated as a file
[2] https://lwn.net/Articles/562211/#oom

Reply via email to