HonahX commented on code in PR #305: URL: https://github.com/apache/iceberg-python/pull/305#discussion_r1468971880
########## pyiceberg/io/pyarrow.py: ########## @@ -906,6 +986,76 @@ def after_map_value(self, element: pa.Field) -> None: self._field_names.pop() +class _ConvertToIcebergWithFreshIds(PreOrderPyArrowSchemaVisitor[Union[IcebergType, Schema]]): Review Comment: Thanks @syun64 for sharing the comprehensive analysis! > assign IDs in pre-order Just want to make sure we are on the same page, my understanding is that `_SetFreshIds` is a little different from a normal pre-order visitor because it [increment the counter for each field and then visit children](https://github.com/apache/iceberg/pull/5627#discussion_r956788070). This makes it also rely on `reserved_ids` to store new ids for fields on the same level. We want to maintain this behavior when refactoring the `_SetFreshIds`. When refactoring, we could probably do it like the [jave implementation](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/api/src/main/java/org/apache/iceberg/types/AssignFreshIds.java#L86-L97) > assign all field_ids as '-1' using _ConvertToIceberg/_ConvertToIcebergFreshIDs? And then use the refactored _SetFreshIDs visitor to assign the IDs in order? Overall, this approach sounds reasonable to me if we can find an easy way to refactor the `_SetFreshIds`. @Fokko, I'd appreciate your perspective on refactoring `_SetFreshIds`. Do you see any issues with this approach? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org