Jackie-Jiang commented on code in PR #9173: URL: https://github.com/apache/pinot/pull/9173#discussion_r939380727
########## pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java: ########## @@ -48,12 +49,18 @@ public final class SVScanDocIdIterator implements ScanBasedDocIdIterator { private int _nextDocId = 0; private long _numEntriesScanned = 0L; - public SVScanDocIdIterator(PredicateEvaluator predicateEvaluator, ForwardIndexReader reader, int numDocs) { + public SVScanDocIdIterator(PredicateEvaluator predicateEvaluator, ForwardIndexReader reader, int numDocs, + NullValueVectorReader nullValueReader, boolean nullHandlingEnabled) { Review Comment: We may simplify it. When null-handling is not enabled, do not pass in `nullValueReader` ```suggestion @Nullable NullValueVectorReader nullValueReader) { ``` ########## pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java: ########## @@ -122,25 +129,25 @@ public long getNumEntriesScanned() { return _numEntriesScanned; } - private ValueMatcher getValueMatcher() { + private ValueMatcher getValueMatcher(ImmutableRoaringBitmap nullBitmap) { Review Comment: (nit) ```suggestion private ValueMatcher getValueMatcher(@Nullable ImmutableRoaringBitmap nullBitmap) { ``` ########## pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java: ########## @@ -188,6 +246,32 @@ public int matchValues(int limit, int[] docIds) { } } + private class DictIdMatcherAndNullHandler implements ValueMatcher { + + private final int[] _buffer = new int[OPTIMAL_ITERATOR_BATCH_SIZE]; + private final ImmutableRoaringBitmap _nullBitmap; + + public DictIdMatcherAndNullHandler(ImmutableRoaringBitmap nullBitmap) { + _nullBitmap = nullBitmap; + } + + @Override + public boolean doesValueMatch(int docId) { + if (_nullBitmap.contains(docId)) { + return false; + } + return _predicateEvaluator.applySV(_reader.getDictId(docId, _readerContext)); + } + + @Override + public int matchValues(int limit, int[] docIds) { + _reader.readDictIds(docIds, limit, _buffer, _readerContext); + int newLimit = MatcherUtils.removeNullDocs(docIds, _buffer, limit, _nullBitmap); + assert newLimit < limit; Review Comment: Should this be `<=`? ########## pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/PredicateUtils.java: ########## @@ -76,7 +76,7 @@ public static IntSet getDictIdSet(BaseInPredicate inPredicate, Dictionary dictio IntSet dictIdSet = new IntOpenHashSet(hashSetSize); switch (dataType.getStoredType()) { case INT: - int[] intValues = inPredicate.getIntValues(); + int[] intValues = dataType == DataType.BOOLEAN ? inPredicate.getBooleanValues() : inPredicate.getIntValues(); Review Comment: Good catch! The clean fix should be change the switch to switch on `dataType` so that we can handle `BOOLEAN` and `TIMESTAMP` properly. Do you need this change for the purpose of this PR? If not, I'd suggest having a separate PR just for this fix so that the scope is clear ########## pinot-core/src/main/java/org/apache/pinot/core/query/reduce/GapfillFilterHandler.java: ########## @@ -48,7 +48,7 @@ public GapfillFilterHandler(FilterContext filter, DataSchema dataSchema) { // TODO: Please refer to {@link PostAggregationHandler} on how to handle the index for aggregation queries. _indexes.put(_dataSchema.getColumnName(i), i); } - _rowMatcher = RowMatcherFactory.getRowMatcher(filter, this); + _rowMatcher = RowMatcherFactory.getRowMatcher(filter, this, false); Review Comment: Let's add a TODO here to support it later ########## pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java: ########## @@ -204,6 +288,34 @@ public int matchValues(int limit, int[] docIds) { } } + private class IntMatcherAndNullHandler implements ValueMatcher { + + private final ImmutableRoaringBitmap _nullBitmap; + private final int[] _buffer = new int[OPTIMAL_ITERATOR_BATCH_SIZE]; + + public IntMatcherAndNullHandler(ImmutableRoaringBitmap nullBitmap) { + _nullBitmap = nullBitmap; + } + + @Override + public boolean doesValueMatch(int docId) { + if (_nullBitmap.contains(docId)) { + // Any comparison (equality, inequality, or membership) with null results in false (similar to Presto) even if Review Comment: Does this mean we should use `IS NULL` and `IS NOT NULL`? We should add it to the comment Also, suggest either copy the comment to all the classes, or move it to the first class (`DictIdMatcherAndNullHandler`) ########## pinot-core/src/main/java/org/apache/pinot/core/query/reduce/filter/PredicateRowMatcher.java: ########## @@ -42,7 +43,10 @@ public PredicateRowMatcher(Predicate predicate, ValueExtractor valueExtractor) { @Override public boolean isMatch(Object[] row) { - Object value = _valueExtractor.extract(row); + return isMatch(_valueExtractor.extract(row)); Review Comment: Per my past experience, the overhead of adding a sub-class might be higher than an extra `null` check, so suggest just perform the `null` check here and remove the sub-class ########## pinot-core/src/main/java/org/apache/pinot/core/query/reduce/HavingFilterHandler.java: ########## @@ -30,7 +30,12 @@ public class HavingFilterHandler { private final RowMatcher _rowMatcher; public HavingFilterHandler(FilterContext havingFilter, PostAggregationHandler postAggregationHandler) { Review Comment: Suggest removing this one if it is just used by tests. We want to enforce people to support null handling from now on -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org