syun64 commented on code in PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1678639461
##########
pyiceberg/io/pyarrow.py:
##########
@@ -1549,9 +1552,16 @@ def __init__(self, iceberg_type: PrimitiveType,
physical_type_string: str, trunc
expected_physical_type = _primitive_to_physical(iceberg_type)
if expected_physical_type != physical_type_string:
- raise ValueError(
- f"Unexpected physical type {physical_type_string} for
{iceberg_type}, expected {expected_physical_type}"
- )
+ # Allow promotable physical types
+ # INT32 -> INT64 and FLOAT -> DOUBLE are safe type casts
+ if (physical_type_string == "INT32" and expected_physical_type ==
"INT64") or (
+ physical_type_string == "FLOAT" and expected_physical_type ==
"DOUBLE"
Review Comment:
I've put in this logic to allow StatsAggregator to collect stats for files
that are added through `add_files` that have file field types that map to
broader Iceberg Schema types. This feels overly specific, and I feel as though
I am duplicating the type
[promote](https://github.com/apache/iceberg-python/blob/e27cd9095503cfe9fa7e0a806ba25d42920c68c5/pyiceberg/schema.py#L1551)
mappings in a different format. I'm open to other ideas if we want to keep
this check on the parquet physical types.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]