HonahX commented on code in PR #305: URL: https://github.com/apache/iceberg-python/pull/305#discussion_r1467303335
########## pyiceberg/io/pyarrow.py: ########## @@ -906,6 +986,76 @@ def after_map_value(self, element: pa.Field) -> None: self._field_names.pop() +class _ConvertToIcebergWithFreshIds(PreOrderPyArrowSchemaVisitor[Union[IcebergType, Schema]]): Review Comment: Thanks @syun64 for the summarization. I fully agree with your reasoning. I raised that question just to make sure we all agree that the minimum requirement for the new visitor here is to assign **arbitrarily unique IDs** for each field. Based on our discussions, there are the following options: 1. update `_ConvertToIceberg` to generate some arbitrarily unique IDs for each field in post-order 2. Add a `_ConvertToIcebergWithFreshIds` post-order visitor 3. Add a `_ConvertToIcebergWithFreshIds` pre-order visitor + `pre_order_visit_pyarrow` Option 1 has almost no code duplication and requires slight changes only. Option 2 requires some code duplication, which can be reduced by re-using `primitive()`(the current implementation) or reduced further by implementing some sort of inheritance as https://github.com/apache/iceberg-python/pull/219#discussion_r1456922888. Option 3 requires some code duplication, which can be reduced by re-using `primitive()` (the current implementation) Since the schema generated by option 3 will still have some difference in field ids assignment than that by `assign_fresh_schema_ids`, I am more inclined toward option 1 and 2. I think we might not need the `pre_order_visit_pyarrow` and `PreOrderPyArrowSchemaVisitor`. For option 1 and 2, I have no strong preference. If there is no further concern about mixing this special case for table creation with the visitor implementaion to read table, option 1 is optimal. If we still think it may be better to separate the logic from `_ConvertToIceberg`, considering this [concern](https://github.com/apache/iceberg-python/pull/219#issuecomment-1859273052), then option 2 can be a compromise choice. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org