syun64 commented on code in PR #848: URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1667216339
########## pyiceberg/io/pyarrow.py: ########## @@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return TimeType() elif pa.types.is_timestamp(primitive): primitive = cast(pa.TimestampType, primitive) - if primitive.unit == "us": - if primitive.tz == "UTC" or primitive.tz == "+00:00": - return TimestamptzType() - elif primitive.tz is None: - return TimestampType() + if primitive.unit in ("s", "ms", "us"): + # Supported types, will be upcast automatically to 'us' + pass + elif primitive.unit == "ns": + if Config().get_bool("downcast-ns-timestamp-on-write"): Review Comment: > I also got some issues with the nanosecond timestamp when collecting statistics: > > ``` > > ??? > E ValueError: Nanosecond resolution temporal type 1615967687249846175 is not safely convertible to microseconds to convert to datetime.datetime. Install pandas to return as Timestamp with nanosecond support or access the .value attribute. > ``` > > At the lines: > > https://github.com/apache/iceberg-python/blob/7afd6d677d0fbc73d6baa8a5b57bdd889d0afe87/pyiceberg/io/pyarrow.py#L1870-L1871 > > This got fixed after updating this to: > > ```python > col_aggs[field_id].update_min(statistics.min_raw) > col_aggs[field_id].update_max(statistics.max_raw) > ``` I tried making this change and realized that this causes our serialization to break because it introduces `bytes` values in our [statistics](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.Statistics.html#pyarrow.parquet.Statistics.min_raw), which cannot be serialized (since it already is). I will need to spend a bit more time to figure out the right change to StatsAggregator to support this change. I also failed to reproduce this issue in my environment (possibly because it has pandas installed) so I'm reverting this change for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org