Re: [PR] Dry up both the two ImpactsEnum Implementation ins Lucene912PostingsReader [lucene]

via GitHub Sat, 12 Oct 2024 02:41:11 -0700


jpountz commented on code in PR #13892:
URL: https://github.com/apache/lucene/pull/13892#discussion_r1797607993



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##########
@@ -1203,70 +1175,53 @@ public long cost() {
     }
   }
 
-  final class BlockImpactsDocsEnum extends ImpactsEnum {
+  private abstract class BlockImpactsEnum extends ImpactsEnum {
 
-    final ForDeltaUtil forDeltaUtil = new ForDeltaUtil();
-    final PForUtil pforUtil = new PForUtil(new ForUtil());
+    protected final ForDeltaUtil forDeltaUtil = new ForDeltaUtil();
+    protected final PForUtil pforUtil = new PForUtil(new ForUtil());
 
-    private final long[] docBuffer = new long[BLOCK_SIZE + 1];
-    private final long[] freqBuffer = new long[BLOCK_SIZE];
+    protected final long[] docBuffer = new long[BLOCK_SIZE + 1];
+    protected final long[] freqBuffer = new long[BLOCK_SIZE];
 
-    private int docBufferUpto;
+    protected final int docFreq; // number of docs in this posting list
 
-    final IndexInput docIn;
-    final PostingDecodingUtil docInUtil;
-    final boolean indexHasFreq;
-    final boolean indexHasPos;
+    protected final IndexInput docIn;
+    protected final PostingDecodingUtil docInUtil;
 
-    private final int docFreq; // number of docs in this posting list
-    private int docCountUpto; // number of docs in or before the current block
-    private int doc; // doc we last read
-    private long prevDocID; // last doc ID of the previous block
-    private long freqFP;
+    protected int docCountUpto; // number of docs in or before the current 
block
+    protected int doc = -1; // doc we last read
+    protected long prevDocID = -1; // last doc ID of the previous block
+    protected int docBufferUpto = BLOCK_SIZE;
 
     // true if we shallow-advanced to a new block that we have not decoded yet
-    private boolean needsRefilling;
+    protected boolean needsRefilling;
 
     // level 0 skip data
-    private int level0LastDocID;
-    private long level0DocEndFP;
-    private final BytesRef level0SerializedImpacts;
-    private final MutableImpactList level0Impacts;
+    protected int level0LastDocID = -1;
+    protected long level0DocEndFP;
+    protected final BytesRef level0SerializedImpacts;
+    protected final MutableImpactList level0Impacts;
     // level 1 skip data
-    private int level1LastDocID;
-    private long level1DocEndFP;
-    private int level1DocCountUpto;
-    private final BytesRef level1SerializedImpacts;
-    private final MutableImpactList level1Impacts;
-
-    public BlockImpactsDocsEnum(FieldInfo fieldInfo, IntBlockTermState 
termState)
-        throws IOException {
-      indexHasFreq = 
fieldInfo.getIndexOptions().compareTo(IndexOptions.DOCS_AND_FREQS) >= 0;
-      indexHasPos =
-          
fieldInfo.getIndexOptions().compareTo(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS)
 >= 0;
-      // We set the last element of docBuffer to NO_MORE_DOCS, it helps save 
conditionals in
-      // advance()
-      docBuffer[BLOCK_SIZE] = NO_MORE_DOCS;
-
-      docFreq = termState.docFreq;
+    protected int level1LastDocID;
+    protected long level1DocEndFP;
+    protected int level1DocCountUpto = 0;
+    protected final BytesRef level1SerializedImpacts;
+    protected final MutableImpactList level1Impacts;
+
+    private BlockImpactsEnum(IntBlockTermState termState) throws IOException {
+      this.docFreq = termState.docFreq;
       if (docFreq > 1) {
-        docIn = Lucene912PostingsReader.this.docIn.clone();
-        docInUtil = VECTORIZATION_PROVIDER.newPostingDecodingUtil(docIn);
+        this.docIn = Lucene912PostingsReader.this.docIn.clone();
+        this.docInUtil = VECTORIZATION_PROVIDER.newPostingDecodingUtil(docIn);
         prefetchPostings(docIn, termState);
       } else {

Review Comment:
   Not due to your change, but this else branch seems to be dead code since we 
use `SlowImpactsEnum` whenever `docFreq < BLOCK_SIZE`.



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##########
@@ -36,18 +36,9 @@
 import org.apache.lucene.codecs.CodecUtil;
 import org.apache.lucene.codecs.PostingsReaderBase;
 import 
org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.IntBlockTermState;
-import org.apache.lucene.index.FieldInfo;
-import org.apache.lucene.index.Impact;
-import org.apache.lucene.index.Impacts;
-import org.apache.lucene.index.ImpactsEnum;
-import org.apache.lucene.index.IndexFileNames;
-import org.apache.lucene.index.IndexOptions;
-import org.apache.lucene.index.PostingsEnum;
-import org.apache.lucene.index.SegmentReadState;
-import org.apache.lucene.index.SlowImpactsEnum;
+import org.apache.lucene.index.*;

Review Comment:
   Nit: We prefer to avoid star imports.



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##########
@@ -1326,15 +1347,12 @@ private void refillDocs() throws IOException {
 
       if (left >= BLOCK_SIZE) {
         forDeltaUtil.decodeAndPrefixSum(docInUtil, prevDocID, docBuffer);
-
-        if (indexHasFreq) {
-          freqFP = docIn.getFilePointer();
-          PForUtil.skip(docIn);
-        }

Review Comment:
   Thanks for simplifying.



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##########
@@ -1316,8 +1257,88 @@ public BytesRef getPayload() {
     }
 
     @Override
-    public int docID() {
-      return doc;
+    public long cost() {
+      return docFreq;
+    }
+
+    private final Impacts impacts =
+        new Impacts() {
+
+          private final ByteArrayDataInput scratch = new ByteArrayDataInput();
+
+          @Override
+          public int numLevels() {
+            int numLevels = 0;
+            if (level0LastDocID != NO_MORE_DOCS) {
+              numLevels++;
+            }
+            if (level1LastDocID != NO_MORE_DOCS) {
+              numLevels++;
+            }
+            if (numLevels == 0) {
+              numLevels++;
+            }
+            return numLevels;
+          }
+
+          @Override
+          public int getDocIdUpTo(int level) {
+            if (level == 0) {
+              return level0LastDocID;
+            }
+            return level == 1 ? level1LastDocID : NO_MORE_DOCS;
+          }
+
+          @Override
+          public List<Impact> getImpacts(int level) {
+            if (level == 0 && level0LastDocID != NO_MORE_DOCS) {
+              return readImpacts(level0SerializedImpacts, level0Impacts);
+            }
+            if (level == 1 && level1LastDocID != NO_MORE_DOCS) {
+              return readImpacts(level1SerializedImpacts, level1Impacts);
+            }
+            return DUMMY_IMPACTS;
+          }

Review Comment:
   I see you refactored this compared with how it's implemented in main. It 
looks simpler this way, and I believe it behaves the same since 
`level0LastDocID` being equal to NO_MORE_DOCS should imply `level1LastDocID` 
being equal to NO_MORE_DOCS too. Have you run luceneutil to check if we indeed 
return the same impacts, so performance shouldn't be affected?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Dry up both the two ImpactsEnum Implementation ins Lucene912PostingsReader [lucene]

Reply via email to