Fokko commented on code in PR #848: URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1666586081
########## pyiceberg/io/pyarrow.py: ########## @@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return TimeType() elif pa.types.is_timestamp(primitive): primitive = cast(pa.TimestampType, primitive) - if primitive.unit == "us": - if primitive.tz == "UTC" or primitive.tz == "+00:00": - return TimestamptzType() - elif primitive.tz is None: - return TimestampType() + if primitive.unit in ("s", "ms", "us"): + # Supported types, will be upcast automatically to 'us' + pass + elif primitive.unit == "ns": + if Config().get_bool("downcast-ns-timestamp-on-write"): Review Comment: > Making sure the add_files API is a as safe as possible while still being performant sounds like a good idea to me. One of the downsides of `add_files` is not having the field-IDs written in the Parquet files. This will limit schema evolution to some extend (zombie columns; dropping a field, and then recreating a field with the same name..). That being said, it is a handy feature. I would prioritize safety over performance since you would write once, and ready many times. > I think the default behavior for pyiceberg should be to fail if attempting to write ns precision timestamps to <=v2 table. To add here, there will be consumers of tables that don't support V3 tables (yet), therefore upgrading a table should be a conscious action to the table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org