Fokko commented on code in PR #305:
URL: https://github.com/apache/iceberg-python/pull/305#discussion_r1470106203
##########
pyiceberg/schema.py:
##########
@@ -1221,50 +1221,57 @@ def assign_fresh_schema_ids(schema_or_type:
Union[Schema, IcebergType], next_id:
class _SetFreshIDs(PreOrderSchemaVisitor[IcebergType]):
"""Traverses the schema and assigns monotonically increasing ids."""
- reserved_ids: Dict[int, int]
+ old_id_to_new_id: Dict[int, int]
Review Comment:
> CREATE TABLE, which is the only operation we've supported so far doesn't
need to compare any IDs against a baseSchema. In the java code, idFor function
uses the existing ID from the baseSchema by searching it by name if it exists.
If it doesn't exist, it assigns the new ID from the counter.
Exactly, we could just pass in an empty schema (to avoid all the
null-checks).
> Instead, it returns the schema object and searches the new identifier
field IDs by name. Is there a reason we decided to step away from this approach
in the Python implementation?
I don't think there was an explicit choice made there. The lookup by column
name is correct, we do the same when evolving the schema:
https://github.com/apache/iceberg-python/blob/a3e36838b05e4ccccac434ab394af758054b12d6/pyiceberg/table/__init__.py#L1404
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]