Re: [PR] Add levels to DocValues skipper index [lucene]

via GitHub Mon, 15 Jul 2024 03:37:39 -0700


iverase commented on code in PR #13563:
URL: https://github.com/apache/lucene/pull/13563#discussion_r1677624311



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java:
##########
@@ -1792,61 +1794,88 @@ public DocValuesSkipper getSkipper(FieldInfo field) 
throws IOException {
     if (input.length() > 0) {
       input.prefetch(0, 1);
     }
+    // TODO: should we write to disk the actual max level for this segment?
     return new DocValuesSkipper() {
-      int minDocID = -1;
-      int maxDocID = -1;
-      long minValue, maxValue;
-      int docCount;
+      final int[] minDocID = new int[SKIP_INDEX_MAX_LEVEL];
+      final int[] maxDocID = new int[SKIP_INDEX_MAX_LEVEL];
+
+      {
+        for (int i = 0; i < SKIP_INDEX_MAX_LEVEL; i++) {
+          minDocID[i] = maxDocID[i] = -1;
+        }
+      }
+
+      final long[] minValue = new long[SKIP_INDEX_MAX_LEVEL];
+      final long[] maxValue = new long[SKIP_INDEX_MAX_LEVEL];
+      final int[] docCount = new int[SKIP_INDEX_MAX_LEVEL];
+      int levels;
 
       @Override
       public void advance(int target) throws IOException {
         if (target > entry.maxDocId) {
-          minDocID = DocIdSetIterator.NO_MORE_DOCS;
-          maxDocID = DocIdSetIterator.NO_MORE_DOCS;
+          // skipper is exhausted
+          for (int i = 0; i < SKIP_INDEX_MAX_LEVEL; i++) {
+            minDocID[i] = maxDocID[i] = DocIdSetIterator.NO_MORE_DOCS;
+          }
         } else {
+          // find next interval
+          assert target > maxDocID[0] : "target must be bigger that current 
interval";
           while (true) {
-            maxDocID = input.readInt();
-            if (maxDocID >= target) {
-              minDocID = input.readInt();
-              maxValue = input.readLong();
-              minValue = input.readLong();
-              docCount = input.readInt();
+            levels = input.readByte();

Review Comment:
   This is done in purpose as I am thinking in this data structure as a skip 
list, no a tree, so any given interval at any level should only be evaluated 
once and therefore only appear once while reading the index.
   
   The idea is that users only see new levels the first time they get updated. 
At that point, user can decide if they want to skip it or not. If they don't 
skip it, then they go inside that level but they should not have to evaluate it 
again and therefore it should not appear in lower levels. 
   
   I think otherwise, we will be evaluating the same interval many times?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add levels to DocValues skipper index [lucene]

Reply via email to