Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

via GitHub Sun, 23 Jun 2024 23:41:37 -0700


HonahX commented on code in PR #848:
URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1650427316



##########
pyiceberg/io/pyarrow.py:
##########
@@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> 
PrimitiveType:
             return TimeType()
         elif pa.types.is_timestamp(primitive):
             primitive = cast(pa.TimestampType, primitive)
-            if primitive.unit == "us":
-                if primitive.tz == "UTC" or primitive.tz == "+00:00":
-                    return TimestamptzType()
-                elif primitive.tz is None:
-                    return TimestampType()
+            if primitive.unit in ("s", "ms", "us"):
+                # Supported types, will be upcast automatically to 'us'
+                pass
+            elif primitive.unit == "ns":
+                if Config().get_bool("downcast-ns-timestamp-on-write"):

Review Comment:
   How about making `downcast_ns_timestamp` a parameter of `schema_to_pyarrow`, 
and reading the Config from yml when we use this API on write? 
`schema_to_pyarrow` itself seems to be a useful public API so it may be good to 
explicitly reveal the optional downcast. This will also help mitigate an edge 
case:
   
   Since `pyarrow_to_schema` is used for both read/write, enabling this option 
also allows unit `ns` to pass the schema conversion when reading. For example, 
If users add a parquet file with `ns` timestamp and try to read the table as 
arrow, they will find the read process pass the `pyarrow_to_schema` check and 
stops at `to_request_schema` with
   ```
    pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would 
lose data:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

Reply via email to