Fokko commented on code in PR #305: URL: https://github.com/apache/iceberg-python/pull/305#discussion_r1470106203
########## pyiceberg/schema.py: ########## @@ -1221,50 +1221,57 @@ def assign_fresh_schema_ids(schema_or_type: Union[Schema, IcebergType], next_id: class _SetFreshIDs(PreOrderSchemaVisitor[IcebergType]): """Traverses the schema and assigns monotonically increasing ids.""" - reserved_ids: Dict[int, int] + old_id_to_new_id: Dict[int, int] Review Comment: > CREATE TABLE, which is the only operation we've supported so far doesn't need to compare any IDs against a baseSchema. In the java code, idFor function uses the existing ID from the baseSchema by searching it by name if it exists. If it doesn't exist, it assigns the new ID from the counter. Exactly, we could just pass in an empty schema (to avoid all the null-checks). > Instead, it returns the schema object and searches the new identifier field IDs by name. Is there a reason we decided to step away from this approach in the Python implementation? I don't think there was an explicit choice made there. The lookup by column name is correct, we do the same when evolving the schema: https://github.com/apache/iceberg-python/blob/a3e36838b05e4ccccac434ab394af758054b12d6/pyiceberg/table/__init__.py#L1404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org