Re: [PR] Prefetch postings data. [lucene]

via GitHub Tue, 14 May 2024 03:36:57 -0700


rmuir commented on code in PR #13364:
URL: https://github.com/apache/lucene/pull/13364#discussion_r1599789208



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsReader.java:
##########
@@ -2049,6 +2074,44 @@ public long cost() {
     }
   }
 
+  private void seekAndPrefetchPostings(IndexInput docIn, IntBlockTermState 
state)
+      throws IOException {
+    if (docIn.getFilePointer() != state.docStartFP) {
+      // Don't prefetch if the input is already positioned at the right 
offset, which suggests that
+      // the caller is streaming the entire inverted index (e.g. for merging), 
let the read-ahead
+      // logic do its work instead. Note that this heuristic doesn't work for 
terms that have skip
+      // data, since skip data is stored after the last term, but handling all 
terms that have <128
+      // docs is a good start already.
+      docIn.seek(state.docStartFP);
+      if (state.skipOffset < 0) {
+        // This postings list is very short as it doesn't have skip data, 
prefetch the page that
+        // holds the first byte of the postings list.
+        docIn.prefetch(1);
+      } else if (state.skipOffset <= MAX_POSTINGS_SIZE_FOR_FULL_PREFETCH) {
+        // This postings list is short as it fits on a few pages, prefetch it 
all, plus one byte to
+        // make sure to include some skip data.
+        docIn.prefetch(state.skipOffset + 1);

Review Comment:
   I'm still trying to wrap my head around this `<= 
MAX_POSTINGS_SIZE_FOR_FULL_PREFETCH` case.
   
   if the postings are short enough that we are willing to fault them all in at 
once, why do we even index skip data at all?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Prefetch postings data. [lucene]

Reply via email to