Re: [PR] `create_table` with a PyArrow Schema [iceberg-python]

via GitHub Sun, 28 Jan 2024 17:12:57 -0800


HonahX commented on code in PR #305:
URL: https://github.com/apache/iceberg-python/pull/305#discussion_r1468971880



##########
pyiceberg/io/pyarrow.py:
##########
@@ -906,6 +986,76 @@ def after_map_value(self, element: pa.Field) -> None:
         self._field_names.pop()
 
 
+class 
_ConvertToIcebergWithFreshIds(PreOrderPyArrowSchemaVisitor[Union[IcebergType, 
Schema]]):

Review Comment:
   Thanks @syun64 for sharing the comprehensive analysis!
   
   > assign IDs in pre-order
   
   Just want to make sure we are on the same page, my understanding is that 
`_SetFreshIds` is a little different from a normal pre-order visitor because it 
[increment the counter for each field and then visit 
children](https://github.com/apache/iceberg/pull/5627#discussion_r956788070). 
This makes it also rely on `reserved_ids` to store new ids for fields on the 
same level. We want to maintain this behavior when refactoring the 
`_SetFreshIds`. When refactoring, we could probably do it like the [jave 
implementation](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/api/src/main/java/org/apache/iceberg/types/AssignFreshIds.java#L86-L97)
   
   > assign all field_ids as '-1' using 
_ConvertToIceberg/_ConvertToIcebergFreshIDs? And then use the refactored 
_SetFreshIDs visitor to assign the IDs in order?
   
   Overall, this approach sounds reasonable to me if we can find an easy way to 
refactor the `_SetFreshIds`. @Fokko, I'd appreciate your perspective on 
refactoring `_SetFreshIds`. Do you see any issues with this approach?
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] `create_table` with a PyArrow Schema [iceberg-python]

Reply via email to