Re: [PR] Prefetch postings data. [lucene]

via GitHub Tue, 14 May 2024 04:25:46 -0700


jpountz commented on PR #13364:
URL: https://github.com/apache/lucene/pull/13364#issuecomment-2109954553


   > This is cool! In the hot case, do we expect prefetch to be a no-op? So we 
are hoping for "first do no harm" in that case?
   
   Yes, mostly. The benchmark I ran at 
https://github.com/apache/lucene/pull/13337#issuecomment-2095430556 suggests 
that there is a 1-2 us overhead for prefetching when data fits in the page 
cache. (Possibly more under concurrent search, I'd need to benchmark this as 
well.) So with 50 segments, 4 clauses and assuming 3 prefetch calls per clause 
(one in the terms dict, one for postings, one for skip data), this would give a 
total overhead of 3*2*50*4 = 1200us = 1.2ms, which looks ok to me.
   
   > But in the cold case, this could cause multiple prefetch calls within one 
segment if the query has multiple terms. And if cross-segment concurrency is 
enabled, multiple such calls across slices (thread work units) too.
   
   This is correct. We could also look into doing inter-segment I/O concurrency 
in the same thread in the future, e.g. by fetching all scorers first, and then 
evaluating these scorers against their segment, but I need to think more of the 
trade-off.
   
   > But in the cold case this may be a big win?
   
   Yes! Hopefully with a few more changes I'll be able to confirm this by 
running a benchmark that runs actual Lucene queries. Early experiments suggest 
that there is a big win indeed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Prefetch postings data. [lucene]

Reply via email to