amrishlal opened a new pull request #6403: URL: https://github.com/apache/incubator-pinot/pull/6403
This PR is motivated by a discussion thread in Apache Pinot # general slack channel (see Dec 18th, 2020 post by Will Briggs), where we noticed that the query `SELECT COUNT(*) FROM myTable WHERE eventTimestamp <= NOW() - 10000` was not working. Upon investigation, it was discovered that: - This was an issue with upcasting and downcasting of numerical values in predicate. - Any query that compared two different numerical type wouldn't work due the type casting issue. For example `SELECT count(*) FROM testTable WHERE intColumn > 15.1`. - Constant expressions are evaluated by Calcite and converted into decimal value string, so a query such as `SELECT count(*) FROM testTable WHERE intColumn = 5 - 4` is transformed by calcite into the query `SELECT count(*) FROM testTable WHERE intColumn = 1.0` before evaluation on server side. Since the query compares two different numerical types, it fails to evaluate. This PR fixes type casting of numerical values, used in predicates, to allow for the queries shown above to work. Following code modifications were mae: - All numerical value strings used in predicates are first parsed as java.lang.Double and then downcasted to numerical type of the column (See changes in `ColumnValueSegmentPruner`). - `Double.valueOf(stringValue).intValue()` will convert the value 3_000_000_000 into Integer.MAX value since 3_000_000_000 exceeds Integer.MAX_VALUE. This is problematic because an INT column may actually contain Integer.MAX_VALUE and Integer.MIN_VALUE and this will cause queries such as `SELECT count(*) from table where intColumn = 3_000_000_000` to return a count of 1 when they should actually be returning count of 0. To properly account for MIN/MAX sentinel values we modified code in `IntDictionary`, `LongDictionary`, and `FloatDictionary`. This allows for properly evaluating queries such as: SELECT count(*) from table where intColumn < 3_000_000_000` and SELECT count(*) from table where intColumn = 3_000_000_000` (see `NumericalPredicateTest`). Changes in functionality and expected results where confirmed by using Postgres as reference database. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org