richardstartin commented on a change in pull request #8411:
URL: https://github.com/apache/pinot/pull/8411#discussion_r837850389



##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/BitmapBasedFilterOperator.java
##########
@@ -100,6 +101,67 @@ protected FilterBlock getNextBlock() {
     }
   }
 
+  @Override
+  public boolean canOptimizeCount() {
+    return true;
+  }
+
+  @Override
+  public int getNumMatchingDocs() {
+    int count = 0;
+    if (_invertedIndexReader == null) {
+      count = _docIds.getCardinality();
+    } else {
+      int[] dictIds = _exclusive
+          ? _predicateEvaluator.getNonMatchingDictIds()
+          : _predicateEvaluator.getMatchingDictIds();
+      switch (dictIds.length) {
+        case 0:
+          break;
+        case 1: {
+          count = _invertedIndexReader.getDocIds(dictIds[0]).getCardinality();
+          break;
+        }
+        case 2: {
+          count = 
ImmutableRoaringBitmap.orCardinality(_invertedIndexReader.getDocIds(dictIds[0]),
+              _invertedIndexReader.getDocIds(dictIds[1]));
+          break;
+        }
+        default: {
+          // this could be optimised if the bitmaps are known to be disjoint 
(as in a single value bitmap index)
+          MutableRoaringBitmap bitmap = new MutableRoaringBitmap();
+          for (int dictId : dictIds) {
+            bitmap.or(_invertedIndexReader.getDocIds(dictId));
+          }
+          count = bitmap.getCardinality();
+          break;
+        }
+      }
+    }
+    return _exclusive ? _numDocs - count : count;
+  }
+
+  @Override
+  public boolean canProduceBitmaps() {
+    return true;
+  }
+
+  @Override
+  public BitmapCollection getBitmaps() {
+    if (_docIds == null) {
+      int[] dictIds = _exclusive
+          ? _predicateEvaluator.getNonMatchingDictIds()
+          : _predicateEvaluator.getMatchingDictIds();
+      ImmutableRoaringBitmap[] bitmaps = new 
ImmutableRoaringBitmap[dictIds.length];
+      for (int i = 0; i < dictIds.length; i++) {
+        bitmaps[i] = (_invertedIndexReader.getDocIds(dictIds[i]));

Review comment:
       Perhaps I should clarify the reasons not to merge bitmaps eagerly
   1. `or` is expensive, its size increases monotonically
   2. flipping bits is expensive, it maps sparse to dense by filling in holes
   3. (not implemented yet) it's always better to distribute intersections 
across unions because it increases sparsity before doing the union
   4. it may never be necessary to do the `or` at all, it can be implemented 
with `andNot` from a universe bitmap, this approach is generally 2x faster than 
eagerly computing an `or`
   5. these bitmaps are all just handles around mapped memory and may never 
need to be copied




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to