amrishlal commented on a change in pull request #6403: URL: https://github.com/apache/incubator-pinot/pull/6403#discussion_r557233144
########## File path: pinot-core/src/main/java/org/apache/pinot/core/query/pruner/ColumnValueSegmentPruner.java ########## @@ -239,17 +240,24 @@ private boolean pruneRangePredicate(IndexSegment segment, RangePredicate rangePr return false; } + /** + * Convert String value to specified numerical type. We first verify that the input string contains a number by parsing + * it as BigDecimal. The resulting BigDecimal is then downcast to specified numerical type. This allows us to create predicates + * which allow for comparing values of two different numerical types such as: + * SELECT * FROM table WHERE a > 5.0 + * SELECT * FROM table WHERE timestamp > NOW() - 5.0. + */ private static Comparable convertValue(String stringValue, DataType dataType) { try { switch (dataType) { case INT: - return Integer.valueOf(stringValue); + return (new BigDecimal(stringValue)).intValue(); Review comment: I think this has more to do with how we are evaluating predicates on Calcite / Broker side rather than UDF. For example, the following query on the broker side: `SELECT * FROM mytable WHERE intColumn > 5 - 4` will get send to the server as: `SELECT * FROM mytable WHERE intColumn >1.0` which requires comparing an int column with a float value. Same thing carries over to queries such as: ` SELECT * FROM mytable WHERE longColumn > NOW() - 1000` which are sent to the server side as: `SELECT * FROM mytable WHERE longColumn > 1.610609874257E12` > Do they throw error for incompatible comparison? No, as long as the numerical types being used in predicates can be compared, the query seems to run fine and return results. I tried several different variations on both MySQL and PostgreSQL to find an exception or limitation to this rule, but could not find one. > Downcast RHS before comparison > Upcast LHS before comparison I think as long as RHS and LHS can be compared, casting may not be needed. For example, in our case 300000000000000000.453 is obviously bigger than any integer value so the predicate "intColumn < 300000000000000000.453" will always be true and we don't need to cast to integer, but it still needs to parse as a numerical value to carry out some sort of comparison. > I highly suspect this will have potential impact on performance If we have the predicate like `intColumn < 3.453` then we parse "3.453" to a BigInteger value and then cast it into integer for carrying out binary search. This is very similar to what would happen if we had a predicate `intColumn < 3`. Here we would parse / cast "3" as an integer for binary search and if for some reason 3 didn't parse as an integer we would throw an exception. In either case 3.453 or 3 is being parsed as biginteger or integer only once just like any other literal in the query and this parsing is independent of the number of values in the column. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org