syun64 commented on code in PR #848:
URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1667028145


##########
pyiceberg/io/pyarrow.py:
##########
@@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> 
PrimitiveType:
             return TimeType()
         elif pa.types.is_timestamp(primitive):
             primitive = cast(pa.TimestampType, primitive)
-            if primitive.unit == "us":
-                if primitive.tz == "UTC" or primitive.tz == "+00:00":
-                    return TimestamptzType()
-                elif primitive.tz is None:
-                    return TimestampType()
+            if primitive.unit in ("s", "ms", "us"):
+                # Supported types, will be upcast automatically to 'us'
+                pass
+            elif primitive.unit == "ns":
+                if Config().get_bool("downcast-ns-timestamp-on-write"):

Review Comment:
   Hi folks - thank you all for the valuable feedback. So it sounds like what 
we want is for the flag to be controlled by the configuration flag, but that 
flag to be passed as a parameter to the `schema_to_pyarrow` API so that its 
behavior can be fully controlled by its input parameters.
   
   I've made the following changes:
   1. Introduced downcast_ns_timestamp_to_us as a new input parameter to 
`pyarrow_to_schema` and `to_requested_schema` public APIs
   2. Now `table` and `catalog` level functions infer the flag from the Config 
on write. (e.g. `_check_schema_compatible` and `_convert_schema_if_needed`)
   3. Always downcast `ns` to `us` on read, if there is `ns` timestamp in the 
parquet file (we will want to revise this behavior when we introduce nanosecond 
support in V3 spec, but until then, I think it's a reasonable assumption that 
data files that are in Iceberg will only be read with microseconds precision). 
https://github.com/apache/iceberg-python/pull/848/files#diff-8d5e63f2a87ead8cebe2fd8ac5dcf2198d229f01e16bb9e06e21f7277c328abdR1030-R1033



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to