coderfender commented on issue #21116: URL: https://github.com/apache/datafusion/issues/21116#issuecomment-4152374181
IMHO there is no right way to represent timestamps with various offsets and effectively leverage vector (SIMD) advantages. 1. Save all timestamps with timezone info 2. Group all timestamps with same timezone offset together to leverage SIMD but this could cause increased memory usage We might have 2 approaches and a hybrid implementation based on the variance in timezones per batch and a `CARDINALITY_FACTOR` kinda parameter to decide which approach is better. Let me raise a draft PR in the following week to provide a POC with some benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
