Re: [PR] Support convert orc timestamptz [iceberg]

via GitHub Thu, 18 Jul 2024 06:07:14 -0700


zzzzming95 commented on PR #9905:
URL: https://github.com/apache/iceberg/pull/9905#issuecomment-2236449363


   > On #9784 you mentioned:
   > 
   > > I think this is because hive and spark treat `timestamp` data type as 
timestamp with time zone and the orc file format is also stored as orc 
`timestamp` type. But in fact the hive `timestamp` data type should be stored 
as `timestamp_instant` in the orc file.
   > 
   > This sounds like Hive and Spark treat the `timestamp` type in a "less 
flexible way". IIRC Spark 3.4 introduced `timestamp_ntz`, whereas Hive uses no 
timezone at all.
   > 
   > The `iceberg.orc.convert.timestamptz` option introduced with this PR seems 
to be a global setting, so it affects all tables. I wonder whether this should 
rather be an ORC type property.
   
   Although the `timestamp_ntz` type is introduced in Spark 3.4+, this type is 
actually stored as the `array<bigint>`data type in orc file.
   
   The purpose of `iceberg.orc.convert.timestamptz` is to provide hive spark 
with a compatible way to access the orc timestamp data type.
   
   I am not quite sure what `I wonder whether this should rather be an ORC type 
property` means. If it means adding a new type to orc, such as `timestamptz`, 
this still cannot solve the incompatible access of the historical orc timestamp 
type.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Support convert orc timestamptz [iceberg]

Reply via email to