zzzzming95 commented on PR #9905: URL: https://github.com/apache/iceberg/pull/9905#issuecomment-2236449363
> On #9784 you mentioned: > > > I think this is because hive and spark treat `timestamp` data type as timestamp with time zone and the orc file format is also stored as orc `timestamp` type. But in fact the hive `timestamp` data type should be stored as `timestamp_instant` in the orc file. > > This sounds like Hive and Spark treat the `timestamp` type in a "less flexible way". IIRC Spark 3.4 introduced `timestamp_ntz`, whereas Hive uses no timezone at all. > > The `iceberg.orc.convert.timestamptz` option introduced with this PR seems to be a global setting, so it affects all tables. I wonder whether this should rather be an ORC type property. Although the `timestamp_ntz` type is introduced in Spark 3.4+, this type is actually stored as the `array<bigint>`data type in orc file. The purpose of `iceberg.orc.convert.timestamptz` is to provide hive spark with a compatible way to access the orc timestamp data type. I am not quite sure what `I wonder whether this should rather be an ORC type property` means. If it means adding a new type to orc, such as `timestamptz`, this still cannot solve the incompatible access of the historical orc timestamp type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org