cyrilou242 opened a new issue, #8581: URL: https://github.com/apache/pinot/issues/8581
## Problem observed. DatetimeConvert buckets with a given granularity. Buckets depend on the timezone. When returning in a SimpleDateFormat, bucketing takes the timezone into account. (https://github.com/apache/pinot/pull/2295) When returning in EPOCH format, it is not possible to pass a timezone. Bucketing is in UTC timezone. To do a timezone aware bucketing with epoch output, my current workaround is to use `datetimeConvert` to a SimpleDateFormat then use `FromDateTime` to convert back to epoch. See `workaround` below. This is not convenient because this involves string parsing. This was discussed here too https://github.com/apache/pinot/pull/4242. ## Example I want to convert an epoch:seconds into an epoch milliseconds and bucket by day granularity. I want to bucket in CET timezone. My epoch corresponds to `April 16 00:00` in CET, but `April 15 22:00` in UTC. When bucketing in CET timezone, the bucket is `April 16 00:00`. When bucketing in UTC, the bucket is `April 15 00:00` I want the output format to be epoch. I can't bucket in CET timezone. ``` select report_date_epoch, ToDateTime(report_date_epoch*1000, 'yyyy-MM-dd HH:mm:ss', 'UTC') as dtStringWithUtcTz, ToDateTime(report_date_epoch*1000, 'yyyy-MM-dd HH:mm:ss', 'CET') as dtStringWithCetTz, dateTimeConvert(report_date_epoch, '1:SECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:DAYS') as dtGroupedEpochNoTz, dateTimeConvert(report_date_epoch, '1:SECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss', '1:DAYS') as dtGroupedStringNoTz, dateTimeConvert(report_date_epoch, '1:SECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.mmm tz(CET)', '1:DAYS') as dtGroupedStringWithTz, dateTimeConvert(report_date_epoch, '1:SECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSSZ tz(CET)', '1:DAYS') as dtGroupedStringWithTzVisible, // not possible to do dtGroupedEpochWithTz // current workaround FromDateTime(dateTimeConvert(report_date_epoch, '1:SECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSSZ tz(CET)', '1:DAYS'), 'yyyy-MM-dd HH:mm:ss.SSSZ') as workaround from orders limit 1 ``` | report_date_epoch | dtStringWithUtcTz | dtStringWithCetTz | dtGroupedEpochNoTz | dtGroupedStringNoTz | dtGroupedStringWithTz | dtGroupedStringWithTzVisible | workaround | |-------------------|---------------------|---------------------|--------------------|---------------------|-------------------------|------------------------------|---------------| | 1650060000 | 2022-04-15 22:00:00 | 2022-04-16 00:00:00 | 1649980800000 | 2022-04-15 00:00:00 | 2022-04-16 00:00:00.000 | 2022-04-16 00:00:00.000+0200 | 1650060000000 | ## Proposal It'd be nice to be able to pass a timezone in `outputFormat` when output is epoch. Eg: `1:MILLISECONDS:EPOCH:Europe/Amsterdam` I understand this could impact performance when doing epoch to epoch conversion, because in this case no conversion to datetime is made to do the bucketing. I think we could have the following logic: - no timezone given --> work as usual --> no conversion for bucketing - timezone is given --> convert epoch to datetime, do the bucketing, convert back to epoch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org