cyrilou242 opened a new issue, #8581:
URL: https://github.com/apache/pinot/issues/8581

   ## Problem observed.
   DatetimeConvert buckets with a given granularity. Buckets depend on the 
timezone.
   When returning in a SimpleDateFormat, bucketing takes the timezone into 
account. (https://github.com/apache/pinot/pull/2295)
   When returning in EPOCH format, it is not possible to pass a timezone. 
Bucketing is in UTC timezone. 
   
   To do a timezone aware bucketing with epoch output, my current workaround is 
to use `datetimeConvert` to a SimpleDateFormat then use `FromDateTime` to 
convert back to epoch. See `workaround` below. This is not convenient because 
this involves string parsing. 
   
   This was discussed here too https://github.com/apache/pinot/pull/4242. 
   
   ## Example
   I want to convert an epoch:seconds into an epoch milliseconds and bucket by 
day granularity. 
   I want to bucket in CET timezone.
   
   My epoch corresponds to `April 16 00:00` in CET, but `April 15 22:00` in 
UTC. 
   When bucketing in CET timezone, the bucket is  `April 16 00:00`. When 
bucketing in UTC, the bucket is `April 15 00:00`
   I want the output format to be epoch. I can't bucket in CET timezone.
   
   ```
   select
   report_date_epoch,
   ToDateTime(report_date_epoch*1000, 'yyyy-MM-dd HH:mm:ss', 'UTC') as 
dtStringWithUtcTz,
   ToDateTime(report_date_epoch*1000, 'yyyy-MM-dd HH:mm:ss', 'CET') as 
dtStringWithCetTz,
   dateTimeConvert(report_date_epoch, '1:SECONDS:EPOCH', 
'1:MILLISECONDS:EPOCH', '1:DAYS') as dtGroupedEpochNoTz,
   dateTimeConvert(report_date_epoch, '1:SECONDS:EPOCH', 
'1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss', '1:DAYS') as 
dtGroupedStringNoTz,
   dateTimeConvert(report_date_epoch, '1:SECONDS:EPOCH', 
'1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.mmm tz(CET)', '1:DAYS') as 
dtGroupedStringWithTz,
   dateTimeConvert(report_date_epoch, '1:SECONDS:EPOCH', 
'1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSSZ  tz(CET)', '1:DAYS') as 
dtGroupedStringWithTzVisible,
   // not possible to do dtGroupedEpochWithTz
   // current workaround 
   FromDateTime(dateTimeConvert(report_date_epoch, '1:SECONDS:EPOCH', 
'1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSSZ tz(CET)', '1:DAYS'), 
'yyyy-MM-dd HH:mm:ss.SSSZ') as workaround
   from orders
   limit 1
   ```
   
   | report_date_epoch | dtStringWithUtcTz   | dtStringWithCetTz   | 
dtGroupedEpochNoTz | dtGroupedStringNoTz | dtGroupedStringWithTz   | 
dtGroupedStringWithTzVisible | workaround    |
   
|-------------------|---------------------|---------------------|--------------------|---------------------|-------------------------|------------------------------|---------------|
   | 1650060000        | 2022-04-15 22:00:00 | 2022-04-16 00:00:00 | 
1649980800000      | 2022-04-15 00:00:00 | 2022-04-16 00:00:00.000 | 2022-04-16 
00:00:00.000+0200 | 1650060000000 |
   
   ## Proposal 
   It'd be nice to be able to pass a timezone in `outputFormat` when output is 
epoch. 
   Eg: `1:MILLISECONDS:EPOCH:Europe/Amsterdam`
   
   I understand this could impact performance when doing epoch to epoch 
conversion, because in this case no conversion to datetime is made to do the 
bucketing. 
   I think we could have the following logic: 
   - no timezone given --> work as usual --> no conversion for bucketing
   - timezone is given --> convert epoch to datetime, do the bucketing, convert 
back to epoch
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to