syun64 commented on code in PR #305: URL: https://github.com/apache/iceberg-python/pull/305#discussion_r1466774496
########## pyiceberg/io/pyarrow.py: ########## @@ -906,6 +986,76 @@ def after_map_value(self, element: pa.Field) -> None: self._field_names.pop() +class _ConvertToIcebergWithFreshIds(PreOrderPyArrowSchemaVisitor[Union[IcebergType, Schema]]): Review Comment: Here's my understanding so far (please let me know if I overlooked anything): - new_table_metadata requires a Schema - Right now, a Schema cannot be created without field_ids assigned - assign_fresh_schema_ids / [_SetFreshIDs](https://github.com/apache/iceberg-python/blob/0f08806d4431d5d60998dac1bca5780b6d2e2785/pyiceberg/schema.py#L1221) requires a unique ID per field to freshly assign the IDs, so we can't use a hack like assigning -1 for all the IDs, and then relying on assign_fresh_schema_ids to yield the correct result I think the alternative would be to update **_ConvertToIceberg** to generate some arbitrarily unique IDs for each field in post-order and then relying on new_table_metadata to correctly assign the IDs from last_update_id -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org