mikemccand commented on issue #15286:
URL: https://github.com/apache/lucene/issues/15286#issuecomment-3427135304

   > So, before we had bulk scoring, I tried this and it actually harmed 
performance :(. But, I was doing something silly I think 
([#13586](https://github.com/apache/lucene/pull/13586))
   
   Oooh thanks for the pointer @benwtrent.  Do you know what silly thing was 
happening in that benchy?  From the PR it looks like there was confusion about 
the results (a common occurrence when benchmarking!!  sigh ¯\_(ツ)_/¯), but no 
smoking gun found?
   
   This comment from @jpountz is also interesting:
   
   > Right now prefetch() has logic to disable itself if data is consistently 
already in the page cache. Thinking out loud, if your benchmark shows that 
prefetch() hurts more than it helps when the data is often but not always in 
the page cache, we could look into disabling madvise in that case too and only 
enable it when data is rarely already in the page cache.
   
   At Amazon we are confused by profiling of one of our PKLookup heavy indexing 
runs, which seem to show non-trivial time spent in `mincore` (the OS level 
function that reports which pages are hot or not for a virtual address range), 
invoked from `MemorySegment.isLoaded` which is the method (I think?) we are 
using to turn off prefetch when things seem already hot enough, I think?  
Unfortunately, `mincore` is O(N) (N = number of pages), and requires allocating 
a temporary `byte[]` array where it sets the status of each page.  But I think 
N should generally be super tiny in how Lucene prefetches?  I think typically 1 
or maybe 2 pages?  We prefetch a terms block (ish) to look up a term, what else?
   
   We (Amazon) then re-ran with a hack to turn off prefetch and saw some gains 
(50 minutes faster on 11.5 hour indexing), not as much as we had expected given 
the scary profiler output showing lots of time in `mincore` (so maybe there be 
ghosts a-lurking, though, it is async profiler).  This was just one internal 
datapoint, we haven't separately tried to make a repeatable benchy, etc.
   
   Still, maybe we should somehow reduce this prefetch overhead for hot indices.
   
   > One downside to how Lucene does prefetching is that the call to adjust the 
OS advise is synchronous. During a hot path query, this just isn't free.
   
   I can see the cost from `isLoaded()` (above) ... but, 
`madvise(MADV_WILLNEED)` isn't supposed to do anything costly?  OK so I asked 
Claude to dig into Linux kernel sources to explain what this actually does.  
Here's [its 
explanation](https://claude.ai/share/d3b8c7c6-b96b-43b4-8280-bb95fe10b01c) and 
the [code snippet it sucked out from Linux kernel 
sources](https://claude.ai/public/artifacts/664ba89d-3ca8-447b-9c40-fc2324c57338)
 (hopefully no hallucinations!).  It looks like it triggers async readahead 
(hmm does that mean it's a no-op if we had `MADV_RANDOM` previously?).  And is 
dropping/acquiring locks and files (sounds like that could take non-trivial 
time maybe).  Also, TIL you can `MADV_WILLNEED` swap (virtual address space not 
backed by memory-mapped file) too.
   
   Anyway, a better long-term solution for Lucene is "real" async-io (like you 
are doing in #15224)?  N virtual threads reading concurrently, and efficiently 
acting on the bytes as they arrive... even in the hot case, this can be a 
concurrency win, since Lucene would be naturally expressing its code as 
data-driven concurrency, so JVM could in theory use multiple real threads to 
process each `byte[]` concurrently when all bytes are hot.  HNSW graph crawl 
seems so embarrassingly concurrent in this sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to