On Thu, May 13, 2021 at 11:04:14PM -0400, Theodore Ts'o wrote: > I'll give you another example that we learned the hard way. Depending > on how tight you make your memory cgroups and how tight you constrain > your I/O controller, it's possible for write throttling --- where > processes which are dirtying memory faster than they can be written > out are put to sleep instead of triggering the OOM killer. It turns > out write throttling when the total system memory is low is quite > different from a particular memory cgroup is low on free memory, and > so the complex interactions between the memory cgroup controller and > and the I/O cgroup controller is another reason why there appears to > be a guaranteed employment act for data center kernel engineers. :-)
Rereading this, I realized it wasn't completely clear because I left out a part of the sentence. Let me reword that. What can happen if you have a memory cgroup controller with tight memory constraints on a container, and an I/O controller throttling I/O in that same container, this can lead to the OOM killer kill off one or more of the processes in that container, because write throttling caused by memory cgroup limits may not work sufficiently well to prevent the OOM killer deciding to start randomly killing processes --- and the I/O throttling essentially makes things worse, because that prevents the I/O from relieving the pressure, but if nothing prevents processes from continuing to dirty pages, then the OOM killer starts going on a moderous rampage and starts killing innocent processes. What's important to note is that of these components are "working" exactly as advertised. It's just combined effects may not be what you wanted, which is an overall systems problem. Cheers, - Ted