syun64 commented on code in PR #848:
URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1667216339


##########
pyiceberg/io/pyarrow.py:
##########
@@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> 
PrimitiveType:
             return TimeType()
         elif pa.types.is_timestamp(primitive):
             primitive = cast(pa.TimestampType, primitive)
-            if primitive.unit == "us":
-                if primitive.tz == "UTC" or primitive.tz == "+00:00":
-                    return TimestamptzType()
-                elif primitive.tz is None:
-                    return TimestampType()
+            if primitive.unit in ("s", "ms", "us"):
+                # Supported types, will be upcast automatically to 'us'
+                pass
+            elif primitive.unit == "ns":
+                if Config().get_bool("downcast-ns-timestamp-on-write"):

Review Comment:
   > I also got some issues with the nanosecond timestamp when collecting 
statistics:
   > 
   > ```
   > >   ???
   > E   ValueError: Nanosecond resolution temporal type 1615967687249846175 is 
not safely convertible to microseconds to convert to datetime.datetime. Install 
pandas to return as Timestamp with nanosecond support or access the .value 
attribute.
   > ```
   > 
   > At the lines:
   > 
   > 
https://github.com/apache/iceberg-python/blob/7afd6d677d0fbc73d6baa8a5b57bdd889d0afe87/pyiceberg/io/pyarrow.py#L1870-L1871
   > 
   > This got fixed after updating this to:
   > 
   > ```python
   >                     col_aggs[field_id].update_min(statistics.min_raw)
   >                     col_aggs[field_id].update_max(statistics.max_raw)
   > ```
   
   I tried making this change and realized that this causes our serialization 
to break because it introduces `bytes` values in our 
[statistics](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.Statistics.html#pyarrow.parquet.Statistics.min_raw),
 which cannot be serialized (since it already is). I will need to spend a bit 
more time to figure out the right change to StatsAggregator to support this 
change. I also failed to reproduce this issue in my environment (possibly 
because it has pandas installed) so I'm reverting this change for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to