mikemccand commented on PR #14156:
URL: https://github.com/apache/lucene/pull/14156#issuecomment-3649593085

   In our use case (Amazon customer facing product search) our indices are 
nearly always hot (sometimes not, and that quickly gets exciting) and Lucene's 
efforts to prefetch(`madvise(WILL_NEED)` and maybe `isLoaded()`) seem to hurt 
... we see ~3% improvement in query throughput (red-line QPS) in offline 
benchmarks at least by forcefully disabling some of the prefetching.
   
   I went digging into the `MemorySegment.isLoaded()` implementation ... its 
name sounds so innocuous, like a getter, but it's doing a lot of stuff under 
that innocuous name.  It `malloc()`s an array (length `numPages`) to hold 
per-page 1 or 0 result returned from the Linux kernel's `mincore()` API -- an 
innefficient bitset! (one byte per bit!  maybe kernel developers thought 
userland developers might not understand bits/bytes?) -- and then it's 
`O(numPages)` since the Linux kernel is digging through its VM page table to 
set those bits (hmm: I wonder how this works with huge pages, transparent or 
not?).  And holy smokes is the `mincore` implementation hairy!  Then it's 
another `for` loop in `isLoaded` to check the returned array (per-page 0 or 1) 
to see if every page is loaded, so for the hot case, it's the worst run time I 
think (run through whole loop to discover every page is loaded).
   
   Gemini (Thinking) [goes into good detail here and shows the actual java and 
C sources](https://gemini.google.com/share/5afabdba6172).  Oh the power and 
terror of abstraction...
   
   Is this issue about sharing the precached/isLoaded state 
(`MemorySegmentIndexInput.consecutivePrefetchHitCount`) from original 
`IndexInput` to sliced/cloned ones?  Slicing because of compound files?  
Cloning because all the iterators queries make will start by cloning the 
`MemorySegmentIndexInput`, and that [`clone` impl makes a new 
`MemorySegmentIndexInput` and resets that clone's `consecutivePrefetchHitCount` 
to 
0](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java#L518-L552).
  So every query begins again assuming nothing is prefetched yet, especially 
penalizing workload that are mostly super fast queries I guess.
   
   Anyway, I don't have any good ideas on what to do about this.  Maybe ideally 
there would be some simple way for the user to turn on/off the prefetch optos?  
Or if we could reduce the cost/impact of prefetch when things are already 
(nearly) not?  Not sure ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to