HonahX commented on code in PR #305:
URL: https://github.com/apache/iceberg-python/pull/305#discussion_r1468971880
##########
pyiceberg/io/pyarrow.py:
##########
@@ -906,6 +986,76 @@ def after_map_value(self, element: pa.Field) -> None:
self._field_names.pop()
+class
_ConvertToIcebergWithFreshIds(PreOrderPyArrowSchemaVisitor[Union[IcebergType,
Schema]]):
Review Comment:
Thanks @syun64 for sharing the comprehensive analysis!
> assign IDs in pre-order
Just want to make sure we are on the same page, my understanding is that
`_SetFreshIds` is a little different from a normal pre-order visitor because it
[increment the counter for each field and then visit
children](https://github.com/apache/iceberg/pull/5627#discussion_r956788070).
This makes it also rely on `reserved_ids` to store new ids for fields on the
same level. We want to maintain this behavior when refactoring the
`_SetFreshIds`. When refactoring, we could probably do it like the [jave
implementation](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/api/src/main/java/org/apache/iceberg/types/AssignFreshIds.java#L86-L97)
> assign all field_ids as '-1' using
_ConvertToIceberg/_ConvertToIcebergFreshIDs? And then use the refactored
_SetFreshIDs visitor to assign the IDs in order?
Overall, this approach sounds reasonable to me if we can find an easy way to
refactor the `_SetFreshIds`. @Fokko, I'd appreciate your perspective on
refactoring `_SetFreshIds`. Do you see any issues with this approach?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]