Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

via GitHub Fri, 05 Jul 2024 02:38:48 -0700


Fokko commented on code in PR #848:
URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1666586081



##########
pyiceberg/io/pyarrow.py:
##########
@@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> 
PrimitiveType:
             return TimeType()
         elif pa.types.is_timestamp(primitive):
             primitive = cast(pa.TimestampType, primitive)
-            if primitive.unit == "us":
-                if primitive.tz == "UTC" or primitive.tz == "+00:00":
-                    return TimestamptzType()
-                elif primitive.tz is None:
-                    return TimestampType()
+            if primitive.unit in ("s", "ms", "us"):
+                # Supported types, will be upcast automatically to 'us'
+                pass
+            elif primitive.unit == "ns":
+                if Config().get_bool("downcast-ns-timestamp-on-write"):

Review Comment:
   > Making sure the add_files API is a as safe as possible while still being 
performant sounds like a good idea to me.
   
   One of the downsides of `add_files` is not having the field-IDs written in 
the Parquet files. This will limit schema evolution to some extend (zombie 
columns; dropping a field, and then recreating a field with the same name..). 
That being said, it is a handy feature. I would prioritize safety over 
performance since you would write once, and ready many times.
   
   > I think the default behavior for pyiceberg should be to fail if attempting 
to write ns precision timestamps to <=v2 table. 
   
   To add here, there will be consumers of tables that don't support V3 tables 
(yet), therefore upgrading a table should be a conscious action to the table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

Reply via email to