Re: [PR] `create_table` with a PyArrow Schema [iceberg-python]

via GitHub Mon, 29 Jan 2024 11:36:08 -0800


Fokko commented on code in PR #305:
URL: https://github.com/apache/iceberg-python/pull/305#discussion_r1470106203



##########
pyiceberg/schema.py:
##########
@@ -1221,50 +1221,57 @@ def assign_fresh_schema_ids(schema_or_type: 
Union[Schema, IcebergType], next_id:
 class _SetFreshIDs(PreOrderSchemaVisitor[IcebergType]):
     """Traverses the schema and assigns monotonically increasing ids."""
 
-    reserved_ids: Dict[int, int]
+    old_id_to_new_id: Dict[int, int]

Review Comment:
   > CREATE TABLE, which is the only operation we've supported so far doesn't 
need to compare any IDs against a baseSchema. In the java code, idFor function 
uses the existing ID from the baseSchema by searching it by name if it exists. 
If it doesn't exist, it assigns the new ID from the counter.
   
   Exactly, we could just pass in an empty schema (to avoid all the 
null-checks).
   
   > Instead, it returns the schema object and searches the new identifier 
field IDs by name. Is there a reason we decided to step away from this approach 
in the Python implementation?
   
   I don't think there was an explicit choice made there. The lookup by column 
name is correct, we do the same when evolving the schema: 
   
https://github.com/apache/iceberg-python/blob/a3e36838b05e4ccccac434ab394af758054b12d6/pyiceberg/table/__init__.py#L1404



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] `create_table` with a PyArrow Schema [iceberg-python]

Reply via email to