amrishlal opened a new pull request #6403:
URL: https://github.com/apache/incubator-pinot/pull/6403


   This PR is motivated by a discussion thread in Apache Pinot # general slack 
channel (see Dec 18th, 2020 post by Will Briggs), where we noticed that the 
query `SELECT COUNT(*) FROM myTable WHERE eventTimestamp <= NOW() - 10000` was 
not working. Upon investigation, it was discovered that:
   
   - This was an issue with upcasting and downcasting of numerical values in 
predicate.
   - Any query that compared two different numerical type wouldn't work due the 
type casting issue. For example `SELECT count(*) FROM testTable WHERE intColumn 
> 15.1`.
   - Constant expressions are evaluated by Calcite and converted into decimal 
value string, so a query such as `SELECT count(*) FROM testTable WHERE 
intColumn = 5 - 4` is transformed by calcite into the query `SELECT count(*) 
FROM testTable WHERE intColumn = 1.0` before evaluation on server side. Since 
the query compares two different numerical types, it fails to evaluate.
   
   This PR fixes type casting of numerical values, used in predicates, to allow 
for the queries shown above to work. Following code modifications were mae:
   - All numerical value strings used in predicates are first parsed as 
java.lang.Double and then downcasted to numerical type of the column (See 
changes in `ColumnValueSegmentPruner`).
   - `Double.valueOf(stringValue).intValue()` will convert the value 
3_000_000_000 into Integer.MAX value since  3_000_000_000 exceeds 
Integer.MAX_VALUE. This is problematic because an INT column may actually 
contain Integer.MAX_VALUE and Integer.MIN_VALUE and this will cause queries 
such as `SELECT count(*) from table where intColumn = 3_000_000_000` to return 
a count of 1 when they should actually be returning count of 0. To properly 
account for MIN/MAX sentinel values we modified code in `IntDictionary`, 
`LongDictionary`, and `FloatDictionary`. This allows for properly evaluating 
queries such as: SELECT count(*) from table where intColumn < 3_000_000_000` 
and SELECT count(*) from table where intColumn = 3_000_000_000` (see 
`NumericalPredicateTest`).
   
   Changes in functionality and expected results where confirmed by using 
Postgres as reference database.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to