[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #9173: Proper null handling in equality, inequality and membership operators for all SV column data types

GitBox Fri, 05 Aug 2022 15:36:20 -0700


Jackie-Jiang commented on code in PR #9173:
URL: https://github.com/apache/pinot/pull/9173#discussion_r939380727



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java:
##########
@@ -48,12 +49,18 @@ public final class SVScanDocIdIterator implements 
ScanBasedDocIdIterator {
   private int _nextDocId = 0;
   private long _numEntriesScanned = 0L;
 
-  public SVScanDocIdIterator(PredicateEvaluator predicateEvaluator, 
ForwardIndexReader reader, int numDocs) {
+  public SVScanDocIdIterator(PredicateEvaluator predicateEvaluator, 
ForwardIndexReader reader, int numDocs,
+      NullValueVectorReader nullValueReader, boolean nullHandlingEnabled) {

Review Comment:
   We may simplify it. When null-handling is not enabled, do not pass in 
`nullValueReader`
   ```suggestion
         @Nullable NullValueVectorReader nullValueReader) {
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java:
##########
@@ -122,25 +129,25 @@ public long getNumEntriesScanned() {
     return _numEntriesScanned;
   }
 
-  private ValueMatcher getValueMatcher() {
+  private ValueMatcher getValueMatcher(ImmutableRoaringBitmap nullBitmap) {

Review Comment:
   (nit)
   ```suggestion
     private ValueMatcher getValueMatcher(@Nullable ImmutableRoaringBitmap 
nullBitmap) {
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java:
##########
@@ -188,6 +246,32 @@ public int matchValues(int limit, int[] docIds) {
     }
   }
 
+  private class DictIdMatcherAndNullHandler implements ValueMatcher {
+
+    private final int[] _buffer = new int[OPTIMAL_ITERATOR_BATCH_SIZE];
+    private final ImmutableRoaringBitmap _nullBitmap;
+
+    public DictIdMatcherAndNullHandler(ImmutableRoaringBitmap nullBitmap) {
+      _nullBitmap = nullBitmap;
+    }
+
+    @Override
+    public boolean doesValueMatch(int docId) {
+      if (_nullBitmap.contains(docId)) {
+        return false;
+      }
+      return _predicateEvaluator.applySV(_reader.getDictId(docId, 
_readerContext));
+    }
+
+    @Override
+    public int matchValues(int limit, int[] docIds) {
+      _reader.readDictIds(docIds, limit, _buffer, _readerContext);
+      int newLimit = MatcherUtils.removeNullDocs(docIds, _buffer, limit, 
_nullBitmap);
+      assert newLimit < limit;

Review Comment:
   Should this be `<=`?



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/PredicateUtils.java:
##########
@@ -76,7 +76,7 @@ public static IntSet getDictIdSet(BaseInPredicate 
inPredicate, Dictionary dictio
     IntSet dictIdSet = new IntOpenHashSet(hashSetSize);
     switch (dataType.getStoredType()) {
       case INT:
-        int[] intValues = inPredicate.getIntValues();
+        int[] intValues = dataType == DataType.BOOLEAN ? 
inPredicate.getBooleanValues() : inPredicate.getIntValues();

Review Comment:
   Good catch! The clean fix should be change the switch to switch on 
`dataType` so that we can handle `BOOLEAN` and `TIMESTAMP` properly.
   Do you need this change for the purpose of this PR? If not, I'd suggest 
having a separate PR just for this fix so that the scope is clear



##########
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/GapfillFilterHandler.java:
##########
@@ -48,7 +48,7 @@ public GapfillFilterHandler(FilterContext filter, DataSchema 
dataSchema) {
       // TODO: Please refer to {@link PostAggregationHandler} on how to handle 
the index for aggregation queries.
       _indexes.put(_dataSchema.getColumnName(i), i);
     }
-    _rowMatcher = RowMatcherFactory.getRowMatcher(filter, this);
+    _rowMatcher = RowMatcherFactory.getRowMatcher(filter, this, false);

Review Comment:
   Let's add a TODO here to support it later



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java:
##########
@@ -204,6 +288,34 @@ public int matchValues(int limit, int[] docIds) {
     }
   }
 
+  private class IntMatcherAndNullHandler implements ValueMatcher {
+
+    private final ImmutableRoaringBitmap _nullBitmap;
+    private final int[] _buffer = new int[OPTIMAL_ITERATOR_BATCH_SIZE];
+
+    public IntMatcherAndNullHandler(ImmutableRoaringBitmap nullBitmap) {
+      _nullBitmap = nullBitmap;
+    }
+
+    @Override
+    public boolean doesValueMatch(int docId) {
+      if (_nullBitmap.contains(docId)) {
+        // Any comparison (equality, inequality, or membership) with null 
results in false (similar to Presto) even if

Review Comment:
   Does this mean we should use `IS NULL` and `IS NOT NULL`? We should add it 
to the comment
   
   Also, suggest either copy the comment to all the classes, or move it to the 
first class (`DictIdMatcherAndNullHandler`)



##########
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/filter/PredicateRowMatcher.java:
##########
@@ -42,7 +43,10 @@ public PredicateRowMatcher(Predicate predicate, 
ValueExtractor valueExtractor) {
 
   @Override
   public boolean isMatch(Object[] row) {
-    Object value = _valueExtractor.extract(row);
+    return isMatch(_valueExtractor.extract(row));

Review Comment:
   Per my past experience, the overhead of adding a sub-class might be higher 
than an extra `null` check, so suggest just perform the `null` check here and 
remove the sub-class



##########
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/HavingFilterHandler.java:
##########
@@ -30,7 +30,12 @@ public class HavingFilterHandler {
   private final RowMatcher _rowMatcher;
 
   public HavingFilterHandler(FilterContext havingFilter, 
PostAggregationHandler postAggregationHandler) {

Review Comment:
   Suggest removing this one if it is just used by tests. We want to enforce 
people to support null handling from now on



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #9173: Proper null handling in equality, inequality and membership operators for all SV column data types

Reply via email to