HonahX commented on code in PR #848: URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1650427316
########## pyiceberg/io/pyarrow.py: ########## @@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return TimeType() elif pa.types.is_timestamp(primitive): primitive = cast(pa.TimestampType, primitive) - if primitive.unit == "us": - if primitive.tz == "UTC" or primitive.tz == "+00:00": - return TimestamptzType() - elif primitive.tz is None: - return TimestampType() + if primitive.unit in ("s", "ms", "us"): + # Supported types, will be upcast automatically to 'us' + pass + elif primitive.unit == "ns": + if Config().get_bool("downcast-ns-timestamp-on-write"): Review Comment: How about making `downcast_ns_timestamp` a parameter of `schema_to_pyarrow`, and reading the Config from yml when we use this API on write? `schema_to_pyarrow` itself seems to be a useful public API so it may be good to explicitly reveal the optional downcast. This will also help mitigate an edge case: Since `pyarrow_to_schema` is used for both read/write, enabling this option also allows unit `ns` to pass the schema conversion when reading. For example, If users add a parquet file with `ns` timestamp and try to read the table as arrow, they will find the read process pass the `pyarrow_to_schema` check and stops at `to_request_schema` with ``` pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org