yashmayya commented on code in PR #14163: URL: https://github.com/apache/pinot/pull/14163#discussion_r1792893017
########## pinot-integration-test-base/src/test/java/org/apache/pinot/integration/tests/QueryGenerator.java: ########## @@ -1005,6 +1005,16 @@ public QueryFragment generatePredicate(String columnName, boolean useMultistageE List<String> columnValues = _columnToValueList.get(columnName); String leftValue = pickRandom(columnValues); String rightValue = pickRandom(columnValues); + + if (_singleValueNumericalColumnNames.contains(columnName)) { Review Comment: I'd added it before [this commit](https://github.com/apache/pinot/pull/14163/commits/6e260423bc6c3f39d68ea7656690f6faf2c01c5d) that removed the `BETWEEN` filter pruning for lower > higher due to this issue - https://github.com/apache/pinot/pull/14163#issuecomment-2393718758. I've removed this change too now. ########## pinot-core/src/main/java/org/apache/pinot/core/query/optimizer/filter/NumericalFilterOptimizer.java: ########## @@ -346,6 +365,156 @@ private static Expression rewriteRangeExpression(Expression range, FilterKind ki return range; } + /** + * Rewrite expressions of the form "column BETWEEN lower AND upper" to ensure that lower and upper bounds are the same + * datatype as the column (or can be cast to the same datatype in the server). + */ + private static Expression rewriteBetweenExpression(Expression between, DataType dataType) { + List<Expression> operands = between.getFunctionCall().getOperands(); + Expression lower = operands.get(1); + Expression upper = operands.get(2); + + if (lower.isSetLiteral()) { + switch (lower.getLiteral().getSetField()) { + case LONG_VALUE: { + long actual = lower.getLiteral().getLongValue(); + // Other data types can be converted on the server side. Review Comment: > Do you mean we should not rewrite BETWEEN the same way as other range filters Yeah, basically this. For instance, taking `intCol >= 2.5` as an example. `2.5` is cast to `2` (int), and then the `>=` is rewritten to `>` because `actual - converted > 0` resulting in `intCol > 2`. For `BETWEEN`, we want to instead rewrite `intCol BETWEEN 2.5 AND y` to `intCol BETWEEN 3 AND y`. We could change the logic for regular range filter to rewrite `intCol >= 2.5` to `intCol > 3` instead to match the `BETWEEN` rewrite logic - is that what you're suggesting? There are some other differences too though. For instance, `floatCol < longLiteral` can be rewritten to `floatCol <= castedFloatLiteral` depending on the comparison between `longLiteral` and `castedFloatLiteral`. We can't do the same for `BETWEEN` though, and we simply skip any conversion in these cases, allowing the server to do the cast. Given these differences, it seemed better overall to keep these rewrites separate, what do you think? ########## pinot-core/src/main/java/org/apache/pinot/core/query/optimizer/filter/NumericalFilterOptimizer.java: ########## @@ -75,33 +85,41 @@ Expression optimizeChild(Expression filterExpression, @Nullable Schema schema) { Function function = filterExpression.getFunctionCall(); FilterKind kind = FilterKind.valueOf(function.getOperator()); switch (kind) { - case IS_NULL: - case IS_NOT_NULL: - // No need to try to optimize IS_NULL and IS_NOT_NULL operations on numerical columns. + case BETWEEN: { + // Verify that value is a numeric column before rewriting. + List<Expression> operands = function.getOperands(); + Expression value = operands.get(0); Review Comment: > Currently for other filter kinds we check whether rhs is numeric but not here. That check itself seems unnecessary, and we can consider consolidating the logic by only checking if lhs is numeric Good call, it looks a lot cleaner now. I've just retained the `rhs` literal check since the rewrite methods assume that `rhs` is a literal and then it handles all the data types appropriately anyway (although it should already be guaranteed that it is a literal due to the compile time functions invoker and the predicate comparison query rewriter that are run before these optimizers). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org