mikemccand opened a new issue, #15068: URL: https://github.com/apache/lucene/issues/15068
### Description [I'm opening this for discussion ... I'm not sure we have to fix anything here, but I at least want to document the situation so if other Lucene users hit max maps limit, we can quickly explain why] At Amazon product search, we've seen our indexer processes sometimes trip the OS hard limit (64 K, though [modern Linuxes (Lini?) have increased to 1 MiB](https://archlinux.org/news/increasing-the-default-vmmax_map_count-value/)) of number of memory-mapped segments, killing the indexing process. Looking at the maps (`cat /proc/<pid>/maps`) it's clear we are leaking maps for `_N.si` files, e.g. the exact same `_7.si` file will be mapped 76 times, and same for `_8.si` and all other open segments. I have a small test case that reproduces this leak (I'll attach it shortly), on 9.12.x, with Java 21 or 24. It does not reproduce on 10.x because we've [changed `_N.si` files to open with `IOContext.READONCE`](https://github.com/apache/lucene/pull/12027) which [turns off the Arena pooling](https://github.com/apache/lucene/blob/branch_10_2/lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java#L61) (sets `confined=true`). To repro, you need an index that has at least one segment. Then you need to hold open a `DirectoryReader` (which creates an Arena for each segment's open files). Then, periodically, read the latest `segments_N` file, which opens and closes each segment's `_N.si` file, and those `_N.si` maps will be added to the still-open Arenas, not unmap'd until you close your reader or a segment is merged away. This is not serious for us (Amazon product search) -- we have workaround for 9.12.x (use `NIOFSDirectory`), or, upgrade to 10.x. Maybe setting `-Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits` to a smallish value (defaults to 1024) would work too, not sure. Context: we [added this cool Arena pooling](https://github.com/apache/lucene/pull/13570) to Lucene to amortize the sometimes highish JDK cost of unmap (which deopts top frames of all running threads to check that they are not accessing the virtual address space about to be unmap'd), which in turn was discovered by [an upstream benchmark](https://github.com/dacapobench/dacapobench/issues/264) (thank you!!). https://github.com/apache/lucene/pull/13555 and https://github.com/apache/lucene/issues/13325 have more context. [JDK-8335480](https://bugs.openjdk.org/browse/JDK-8335480), delivered in JDK 24, tries to reduce the JDK deopt cost of unmap. Our usage was somewhat expert (periodically reading the latest commit point (`segments_N` file) while indexing), and the leak is fixed in 10.x. But there is maybe still some open risk if an app uses Lucene APIs to open other files, e.g. maybe the app does lots of deletes against old segments, and maps/unmaps those deletion files, and those "leak"? Or maybe these paths that reproduce the "leak" are so expert that users won't hit them in practice in 9.x / 10.x? Or maybe we should decrease the default max maps in a single arena from 1024? ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org