Thanks Jason, that's very helpful. I'll see if changing the `lg_chunk` parameter changes anything.
In the meantime I found out that one likely reason for why RethinkDB generates so many discontiguous VM mappings is because of our use of `mprotect`. We use `mprotect` to install "guard pages" in heap-allocated coroutine stacks, of which there can be quite a few under some workloads. I now believe that this isn't really a jemalloc issue per se. At the very least there are other factors involved. We'll look into this more on our side, but I consider this a false alarm for now. Thanks for taking your time for explaining things here! - Daniel On Fri, Apr 22, 2016 at 10:41 PM, Jason Evans <[email protected]> wrote: > On Apr 22, 2016, at 10:22 PM, Daniel Mewes <[email protected]> wrote: > > The reason for the failing `munmap` appears to be that we hit the > kernel's `max_map_count` limit. > > > > I can reproduce the issue very quickly by reducing the limit through > `echo 16000 > /proc/sys/vm/max_map_count`, and it disappears in our tests > when increasing it to something like `echo 131060 > > /proc/sys/vm/max_map_count`. The default value is 65530 I believe. > > > > We used to see this behavior in jemalloc 2.x, but didn't see it in 3.x > anymore. It now re-appeared somewhere between 3.6 and 4.1. > > Version 4 switched to per arena management of huge allocations, and along > with that completely independent trees of cached chunks. For many > workloads this means increased virtual memory usage, since cached chunks > can't migrate among arenas. I have plans to reduce the impact somewhat by > decreasing the number of arenas by 4X, but the independence of arenas' > mappings has numerous advantages that I plan to leverage more over time. > > > Do you think the allocator should handle reaching the map_count limit > and somehow deal with it gracefully (if that's even possible)? Or should we > just advise our users to raise the kernel limit, or alternatively try to > change RethinkDB's allocation patterns to avoid hitting it? > > I'm surprised you're hitting this, because the normal mode of operation is > for jemalloc's chunk allocation to get almost all contiguous mappings, > which means very few distinct kernel VM map entries. Is it possible that > RethinkDB is routinely calling mmap() and interspersing mappings that are > not a multiple of the chunk size? One would hope that the kernel could > densely pack such small mappings in the existing gaps between jemalloc's > chunks, but unfortunately Linux uses fragile heuristics to find available > virtual memory (the exact problem that --disable-munmap works around). > > To your question about making jemalloc gracefully deal with munmap() > failure, it seems likely that mmap() is in imminent danger of failing under > these conditions, so there's not much that can be done. In fact, jemalloc > only aborts if the abort option is set to true (the default for debug > builds), so the error message jemalloc is printing probably doesn't > directly correspond to a crash. > > As a workaround, you could substantially increase the chunk size (e.g. > MALLOC_CONF=lg_chunk:30), but better would be to diagnose and address > whatever is causing the terrible VM map fragmentation. > > Thanks, > Jason
_______________________________________________ jemalloc-discuss mailing list [email protected] http://www.canonware.com/mailman/listinfo/jemalloc-discuss
