cyrilou242 opened a new issue, #12091:
URL: https://github.com/apache/pinot/issues/12091

   ## Problem observed
   `DATETIMECONVERT(date_col, config...)` does not return the correct type when 
`date_col` is used in the `GROUP BY`clause.
   
   **The issue may not be specific to the DATETIMECONVERT transform, but I have 
not tested with other transforms.**
   
   ## Example
   Using DATETIMECONVERT to transform a string column in an EPOCH format, the 
expected output type is LONG
   ```
   SELECT
   DATETIMECONVERT("date", '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd', 
'1:MILLISECONDS:EPOCH', '1:DAYS') AS ts,
   FROM pageviews
   LIMIT 10
   ```
   returns
   ```
   "dataSchema": {
         "columnNames": [
           "ts"
         ],
         "columnDataTypes": [
           "LONG"
         ]
       },
       "rows": [
         [
           1580601600000
         ],
         ...
   ```
   The `ts` column datatype is `LONG` ✅ 
   
   When we use the ts column to group by, the column datatype is still a long
   
   ```
   SELECT
   DATETIMECONVERT("date", '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd', 
'1:MILLISECONDS:EPOCH', '1:DAYS') AS ts,
   count(*) AS met
   FROM pageviews
   GROUP BY ts
   LIMIT 10
   ```
   
   Returns 
   ```
   "dataSchema": {
         "columnNames": [
           "ts",
           "met"
         ],
         "columnDataTypes": [
           "LONG",
           "LONG"
         ]
       },
       "rows": [
         [
           1581638400000,
           1800
         ],
         ...
   ```
   
   The `ts` column datatype is `LONG` ✅ 
   
   But when grouping by the transformed column directly, the type is different: 
   ```
   SELECT
   DATETIMECONVERT("date", '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd', 
'1:MILLISECONDS:EPOCH', '1:DAYS') AS ts,
   count(*) AS met
   FROM pageviews
   GROUP BY "date"  -- change is here
   LIMIT 10
   ```
   (notice: `GROUP by "date"`)
   returns 
   ```
   "dataSchema": {
         "columnNames": [
           "ts",
           "met"
         ],
         "columnDataTypes": [
           "STRING",
           "LONG"
         ]
       },
       "rows": [
         [
           "1581206400000",
           1800
         ],
         ...
   ```
   --> `ts` column is of type `STRING` ⚠️ 
   
   **Bug**: It looks like the returned type is the type of of the input time 
column instead of the type of the output.
   
   The data to reproduce is available here: 
   https://github.com/startreedata/thirdeye/tree/master/examples/pageviews 
   
   If this helps: in most recent versions - or maybe it depends on the type of 
the original column -  the incorrect type column can also be of type `OBJECT`.
   I can provide an internal online env to reproduce if this helps.
   
   ## Versions: 
   The issue can be reproduced in `0.12.1`, `1.1.0-ST.19.3` and `1.1.0-ST.19.5`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to