Re: [PR] Reduce the overhead of `IndexInput#prefetch` when data is cached in RAM. [lucene]

via GitHub Sun, 19 May 2024 14:20:50 -0700


jpountz commented on code in PR #13381:
URL: https://github.com/apache/lucene/pull/13381#discussion_r1606104586



##########
lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java:
##########
@@ -57,6 +58,7 @@ abstract class MemorySegmentIndexInput extends IndexInput 
implements RandomAcces
   MemorySegment
       curSegment; // redundant for speed: segments[curSegmentIndex], also 
marker if closed!
   long curPosition; // relative to curSegment, not globally
+  int consecutivePrefetchHitCount;

Review Comment:
   I wondered about clones and slices too. It sounds worth thinking about, but 
also not completely straightforward to get right. For instance, we have slices 
of the same index input that have different access patterns (e.g. cfs files), 
so they should likely track different counters to work properly. But tracking a 
counter per slices (offset/length) is somewhat complex, as we generally create 
slices from the original index input rather than via cloning an initial slice.  
Furthermore, the logic I suggested here *very quickly* starts skipping calls to 
`madvise` if `prefetch` is called on a memory region that is loaded in the page 
cache: the number of `madvise` calls is effectively a log of the number of 
times prefetch() gets called. So this makes the number of `madvise` calls that 
a query performs mostly a function of the number of clones that a query 
creates. IMO this is already quite a good start as we need to bound the number 
of clones that queries created anyway since clones are not
  free (especially with `NIOFSDirectory`'s buffer).
   
   I still think it's worth exploring, but let's do it in a follow-up?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Reduce the overhead of `IndexInput#prefetch` when data is cached in RAM. [lucene]

Reply via email to