Re: [I] Examine the affects of MADV_RANDOM when MGLRU is enabled in Linux kernel [lucene]

via GitHub Fri, 25 Apr 2025 04:06:58 -0700


mikemccand commented on issue #14408:
URL: https://github.com/apache/lucene/issues/14408#issuecomment-2830109470


   > > The Linux change targets both MGLRU and normal LRU. The impact is more 
pronounced in MGLRU, as page reclamation is more aggressive there. However, the 
semantic change for this advice is the same in both cases. In the latest 
kernels, using `MADV_RANDOM` does not mark the page as accessed, regardless of 
whether MGLRU is in use. That's a big shift of semantic for our default read 
advice.
   > 
   > Easy argument to change the default to `NORMAL`.
   
   +1 to go back to `NORMAL` as default, until we can better understand the 
regressions we (OpenSearch users, Elasticsearch users, and Amazon product 
search (my team)) are  seeing with `MADV_RANDOM`.
   
   I think `MADV_RANDOM` can also be harmful for "hot" (index expected to 
mostly fit in RAM) use cases.
   
   For our service (Amazon product search), which is mostly hot, we had to 
hard-override back to `IOContext.DEFAULT` for `.vec` and `.veq` (quantized 
vectors) in a hackity way (subclass `MMapDirectory` to insert shim (that 
rewrites the `IOContext`) into `openInput` -- oooh as @jpountz describes at 
https://github.com/apache/lucene/issues/14348#issuecomment-2730966937, except 
opposite), in some cases (lighting a new commit point during NRT replication) 
where we had to turn off `MMapDirectory.setPreload`.
   
   At Lucene's defaults (`MADV_RANDOM` for the KNN vector files) we saw 
horribly slow warmup of our searchers ... basically, paging in all those 
vectors one at a time as "real" queries visited the HNSW graph was crazy slow 
(many minutes) even on crazy fast infra (AWS), whereas letting the OS do its 
default "thing" (bulk readahead of N pages when a page miss happens?) was much 
quicker.  Much less  "page fault amplification".
   
   Benchmarks in luceneutil also hit this -- minutes and minutes of swapping in 
the HNSW graph (without `.setPreload`) from a fast local SSD, but I think 
luceneutil is still using Lucene's `IOContext` defaults here. 
   
   Actually, if we `MADV_RANDOM` and `.setPreload` to load `.vec`, what is the 
effect?  Does the preloading still work (OS caches/touches all pages, and does 
mark them as accessed (so they stay cached), despite the `MADV_RANDOM`)?  Is it 
much slower to preload when you `MADV_RANDOM` (though presumably it is 
sequentially bringing pages in)?
   
   > AFAIK https://github.com/apache/lucene/issues/14422 is working on fixing 
that "real problem".
   
   +1 to work towards this more general fix.  But, sheesh, it looks so 
complicated, depending on hot vs cold use case, preloading or not, which part 
of the Lucene index (KNN, terms, postings), Linux kernel versions, ... in the 
mean time I think we should revert back to `NORMAL`/`DEFAULT` as Lucene's 
default...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Examine the affects of MADV_RANDOM when MGLRU is enabled in Linux kernel [lucene]

Reply via email to