rmuir commented on code in PR #13364: URL: https://github.com/apache/lucene/pull/13364#discussion_r1599789208
########## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsReader.java: ########## @@ -2049,6 +2074,44 @@ public long cost() { } } + private void seekAndPrefetchPostings(IndexInput docIn, IntBlockTermState state) + throws IOException { + if (docIn.getFilePointer() != state.docStartFP) { + // Don't prefetch if the input is already positioned at the right offset, which suggests that + // the caller is streaming the entire inverted index (e.g. for merging), let the read-ahead + // logic do its work instead. Note that this heuristic doesn't work for terms that have skip + // data, since skip data is stored after the last term, but handling all terms that have <128 + // docs is a good start already. + docIn.seek(state.docStartFP); + if (state.skipOffset < 0) { + // This postings list is very short as it doesn't have skip data, prefetch the page that + // holds the first byte of the postings list. + docIn.prefetch(1); + } else if (state.skipOffset <= MAX_POSTINGS_SIZE_FOR_FULL_PREFETCH) { + // This postings list is short as it fits on a few pages, prefetch it all, plus one byte to + // make sure to include some skip data. + docIn.prefetch(state.skipOffset + 1); Review Comment: I'm still trying to wrap my head around this `<= MAX_POSTINGS_SIZE_FOR_FULL_PREFETCH` case. if the postings are short enough that we are willing to fault them all in at once, why do we even index skip data at all? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org