ChrisHegarty opened a new issue, #14408: URL: https://github.com/apache/lucene/issues/14408
With the relatively recent capability to call `madvise` in Lucene, we've started to use `MADV_RANDOM` in several places where it makes conceptual sense, e.g. for accessing vector data when navigating the graph. The memory access is truly random, but we've seen several reports of performance regressions that appear as a result of this. Of particular concern is the interaction of `MADV_RANDOM` with Multi-Gen LRU [1]. From my reading of the code, and someone please correct me, the semantics of `MADV_RANDOM` has changed in the kernel with MGLRU, and results in pages being proactively reclaimed more eagerly, even when there is no memory pressure. Specifically after https://github.com/torvalds/linux/commit/8788f6781486769d9598dcaedc3fe0eb12fc3e59. This Elasticsearch issue has the more of the lower-level details, https://github.com/elastic/elasticsearch/issues/124499. This issue may also have some connection, https://github.com/apache/lucene/issues/14281. I opened this issue to help facilitate a discussion and hopefully converge on a potential direction to mitigate the possibility of performance regressions. For example, one possible mitigation would be to expose the `ReadAdivce` that will be used as part of the API, so that callers can have more fine-grained control over whether or not to use `MADV_RANDOM`. [1] https://docs.kernel.org/admin-guide/mm/multigen_lru.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org