Re: [PR] create_table with PyArrow Schema [iceberg-python]

via GitHub Thu, 25 Jan 2024 10:31:41 -0800


syun64 commented on code in PR #305:
URL: https://github.com/apache/iceberg-python/pull/305#discussion_r1466774496



##########
pyiceberg/io/pyarrow.py:
##########
@@ -906,6 +986,76 @@ def after_map_value(self, element: pa.Field) -> None:
         self._field_names.pop()
 
 
+class 
_ConvertToIcebergWithFreshIds(PreOrderPyArrowSchemaVisitor[Union[IcebergType, 
Schema]]):

Review Comment:
   Here's my understanding so far (please let me know if I overlooked anything):
   
   - new_table_metadata requires a Schema
   - Right now, a Schema cannot be created without field_ids assigned
   - assign_fresh_schema_ids / 
[_SetFreshIDs](https://github.com/apache/iceberg-python/blob/0f08806d4431d5d60998dac1bca5780b6d2e2785/pyiceberg/schema.py#L1221)
 requires a unique ID per field to freshly assign the IDs, so we can't use a 
hack like assigning -1 for all the IDs, and then relying on 
assign_fresh_schema_ids to yield the correct result
   
   I think the alternative would be to update **_ConvertToIceberg** to generate 
some arbitrarily unique IDs for each field in post-order and then relying on 
new_table_metadata to correctly assign the IDs from last_update_id



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] create_table with PyArrow Schema [iceberg-python]

Reply via email to