yashmayya commented on code in PR #14163:
URL: https://github.com/apache/pinot/pull/14163#discussion_r1792893017


##########
pinot-integration-test-base/src/test/java/org/apache/pinot/integration/tests/QueryGenerator.java:
##########
@@ -1005,6 +1005,16 @@ public QueryFragment generatePredicate(String 
columnName, boolean useMultistageE
       List<String> columnValues = _columnToValueList.get(columnName);
       String leftValue = pickRandom(columnValues);
       String rightValue = pickRandom(columnValues);
+
+      if (_singleValueNumericalColumnNames.contains(columnName)) {

Review Comment:
   I'd added it before [this 
commit](https://github.com/apache/pinot/pull/14163/commits/6e260423bc6c3f39d68ea7656690f6faf2c01c5d)
 that removed the `BETWEEN` filter pruning for lower > higher due to this issue 
- https://github.com/apache/pinot/pull/14163#issuecomment-2393718758. I've 
removed this change too now.



##########
pinot-core/src/main/java/org/apache/pinot/core/query/optimizer/filter/NumericalFilterOptimizer.java:
##########
@@ -346,6 +365,156 @@ private static Expression 
rewriteRangeExpression(Expression range, FilterKind ki
     return range;
   }
 
+  /**
+   * Rewrite expressions of the form "column BETWEEN lower AND upper" to 
ensure that lower and upper bounds are the same
+   * datatype as the column (or can be cast to the same datatype in the 
server).
+   */
+  private static Expression rewriteBetweenExpression(Expression between, 
DataType dataType) {
+    List<Expression> operands = between.getFunctionCall().getOperands();
+    Expression lower = operands.get(1);
+    Expression upper = operands.get(2);
+
+    if (lower.isSetLiteral()) {
+      switch (lower.getLiteral().getSetField()) {
+        case LONG_VALUE: {
+          long actual = lower.getLiteral().getLongValue();
+          // Other data types can be converted on the server side.

Review Comment:
   > Do you mean we should not rewrite BETWEEN the same way as other range 
filters
   
   Yeah, basically this. For instance, taking `intCol >= 2.5` as an example. 
`2.5` is cast to `2` (int), and then the `>=` is rewritten to `>` because 
`actual - converted > 0` resulting in `intCol > 2`. For `BETWEEN`, we want to 
instead rewrite `intCol BETWEEN 2.5 AND y` to `intCol BETWEEN 3 AND y`. We 
could change the logic for regular range filter to rewrite `intCol >= 2.5` to 
`intCol > 3` instead to match the `BETWEEN` rewrite logic - is that what you're 
suggesting? There are some other differences too though. For instance, 
`floatCol < longLiteral` can be rewritten to `floatCol <= castedFloatLiteral` 
depending on the comparison between `longLiteral` and `castedFloatLiteral`. We 
can't do the same for `BETWEEN` though, and we simply skip any conversion in 
these cases, allowing the server to do the cast. Given these differences, it 
seemed better overall to keep these rewrites separate, what do you think?



##########
pinot-core/src/main/java/org/apache/pinot/core/query/optimizer/filter/NumericalFilterOptimizer.java:
##########
@@ -75,33 +85,41 @@ Expression optimizeChild(Expression filterExpression, 
@Nullable Schema schema) {
     Function function = filterExpression.getFunctionCall();
     FilterKind kind = FilterKind.valueOf(function.getOperator());
     switch (kind) {
-      case IS_NULL:
-      case IS_NOT_NULL:
-        // No need to try to optimize IS_NULL and IS_NOT_NULL operations on 
numerical columns.
+      case BETWEEN: {
+        // Verify that value is a numeric column before rewriting.
+        List<Expression> operands = function.getOperands();
+        Expression value = operands.get(0);

Review Comment:
   > Currently for other filter kinds we check whether rhs is numeric but not 
here. That check itself seems unnecessary, and we can consider consolidating 
the logic by only checking if lhs is numeric
   
   Good call, it looks a lot cleaner now. I've just retained the `rhs` literal 
check since the rewrite methods assume that `rhs` is a literal and then it 
handles all the data types appropriately anyway (although it should already be 
guaranteed that it is a literal due to the compile time functions invoker and 
the predicate comparison query rewriter that are run before these optimizers).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to