Jackie-Jiang commented on code in PR #14163: URL: https://github.com/apache/pinot/pull/14163#discussion_r1792544661
########## pinot-core/src/main/java/org/apache/pinot/core/query/optimizer/filter/NumericalFilterOptimizer.java: ########## @@ -346,6 +365,156 @@ private static Expression rewriteRangeExpression(Expression range, FilterKind ki return range; } + /** + * Rewrite expressions of the form "column BETWEEN lower AND upper" to ensure that lower and upper bounds are the same + * datatype as the column (or can be cast to the same datatype in the server). + */ + private static Expression rewriteBetweenExpression(Expression between, DataType dataType) { + List<Expression> operands = between.getFunctionCall().getOperands(); + Expression lower = operands.get(1); + Expression upper = operands.get(2); + + if (lower.isSetLiteral()) { + switch (lower.getLiteral().getSetField()) { + case LONG_VALUE: { + long actual = lower.getLiteral().getLongValue(); + // Other data types can be converted on the server side. Review Comment: This part I still don't follow. Do you mean we should not rewrite `BETWEEN` the same way as other range filters, or is it too complicated? As long as we computed lower and upper bound, we should be able to assemble it back to a `BETWEEN`. It is also fine to do it separately ########## pinot-core/src/main/java/org/apache/pinot/core/query/optimizer/filter/NumericalFilterOptimizer.java: ########## @@ -75,33 +85,41 @@ Expression optimizeChild(Expression filterExpression, @Nullable Schema schema) { Function function = filterExpression.getFunctionCall(); FilterKind kind = FilterKind.valueOf(function.getOperator()); switch (kind) { - case IS_NULL: - case IS_NOT_NULL: - // No need to try to optimize IS_NULL and IS_NOT_NULL operations on numerical columns. + case BETWEEN: { + // Verify that value is a numeric column before rewriting. + List<Expression> operands = function.getOperands(); + Expression value = operands.get(0); Review Comment: (minor) Suggest naming it `lhs`. `value` is a little bit misleading here as it is usually a column or a function. Currently for other filter kinds we check whether `rhs` is numeric but not here. That check itself seems unnecessary, and we can consider consolidating the logic by only checking if `lhs` is numeric ########## pinot-integration-test-base/src/test/java/org/apache/pinot/integration/tests/QueryGenerator.java: ########## @@ -1005,6 +1005,16 @@ public QueryFragment generatePredicate(String columnName, boolean useMultistageE List<String> columnValues = _columnToValueList.get(columnName); String leftValue = pickRandom(columnValues); String rightValue = pickRandom(columnValues); + + if (_singleValueNumericalColumnNames.contains(columnName)) { Review Comment: Did you add this in order for the test to pass? We probably want to test scenarios when lower is larger than higher (always false scenario) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org