jpountz commented on code in PR #14511: URL: https://github.com/apache/lucene/pull/14511#discussion_r2058399082
########## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ########## @@ -1286,14 +1298,11 @@ public long cost() { @Override public int numLevels() { - return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 1 : 2; + return level1LastDocID == NO_MORE_DOCS ? 1 : 2; } @Override public int getDocIdUpTo(int level) { - if (indexHasFreq == false) { - return NO_MORE_DOCS; - } Review Comment: Likewise, let's keep this? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ########## @@ -1286,14 +1298,11 @@ public long cost() { @Override public int numLevels() { - return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 1 : 2; + return level1LastDocID == NO_MORE_DOCS ? 1 : 2; Review Comment: I would keep it the way it was for now. Exposing a single level that covers the whole doc ID space may cause inefficiencies with some scorers, but we should fix these scorers instead of tuning this class. If all docs have the same score, it should be just fine to return a single level of impact that covers the whole doc ID space. ########## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ########## @@ -282,6 +290,10 @@ public PostingsEnum postings( @Override public ImpactsEnum impacts(FieldInfo fieldInfo, BlockTermState state, int flags) throws IOException { + if (state.docFreq <= BLOCK_SIZE) { + // no skip data + return new SlowImpactsEnum(postings(fieldInfo, state, null, flags)); + } Review Comment: Let's not use `SlowImpactsEnum` again or it would cancel the speedup from https://github.com/apache/lucene/pull/14017 (annotation HJ at https://benchmarks.mikemccandless.com/OrHighHigh.html). ########## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ########## @@ -66,13 +67,20 @@ public final class Lucene103PostingsReader extends PostingsReaderBase { static final VectorizationProvider VECTORIZATION_PROVIDER = VectorizationProvider.getInstance(); + // Dummy impacts, composed of the maximum possible term frequency and the lowest possible // (unsigned) norm value. This is typically used on tail blocks, which don't actually record - // impacts as the storage overhead would not be worth any query evaluation speedup, since there's + // impacts as the storage overhead would not be worth any query evaluation speedup, since + // there's // less than 128 docs left to evaluate anyway. private static final List<Impact> DUMMY_IMPACTS = Collections.singletonList(new Impact(Integer.MAX_VALUE, 1L)); + // We stopped storing a placeholder impact with freq=1 for fields with IndexOptions.DOCS after + // 9.12.0 + private static final List<Impact> NON_COMPETITIVE_IMPACTS = + Collections.singletonList(new Impact(1, 1L)); Review Comment: Maybe call it `DUMMY_IMPACTS_NO_FREQS` or something like that to draw a parallel with `DUMMY_IMPACTS` above? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ########## @@ -1309,8 +1318,9 @@ public List<Impact> getImpacts(int level) { if (level == 1) { return readImpacts(level1SerializedImpacts, level1Impacts); } + return DUMMY_IMPACTS; } - return DUMMY_IMPACTS; + return NON_COMPETITIVE_IMPACTS; Review Comment: I would keep it the way it was before and just add this at the beginning of this method (similar to `getDocIdUpTo`): ```java if (indexHasFreq == false) { return DUMMY_IMPACTS_NO_FREQS; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org