HonahX commented on issue #290: URL: https://github.com/apache/iceberg-python/issues/290#issuecomment-1903304501
Hi @kevinjqliu. In Pyiceberg, the `update_schema()...commit()` increments the schema id: https://github.com/apache/iceberg-python/blob/a56838dc5d9acc5f0e0d70919bfc433c7d0756f1/pyiceberg/table/__init__.py#L1871 I think what you observed in `test_base.py` is because the `_commit_table` implementation in in-memory catalog uses `new_table_metadata` instead of [`update_table_metadata`](https://github.com/apache/iceberg-python/blob/a56838dc5d9acc5f0e0d70919bfc433c7d0756f1/pyiceberg/table/__init__.py#L612-L628) to commit changes. `new_table_metadata` is normally used when we want to create a new table, so it will re-create a new schema using the given one, assigning new field ids and reset `schema-id` to default (0). The `update_table_metadata`, in contrast, will update the metadata using the schema generated by `update_schema()...commit()` which has its schema-id incremented. For example, this test shows that the `schema-id` is incremented if we use GlueCatalog, which has the formal implementation of `_commit_table`: https://github.com/apache/iceberg-python/blob/a56838dc5d9acc5f0e0d70919bfc433c7d0756f1/tests/catalog/test_glue.py#L526-L555 I think the `_commit_table` in the InMemoryCatalog was implemented when we did not have a way to update the table metadata. It was a workaround to write basic tests for the transaction API and schema evolution. We should update the implementation to use the `update_table_metadata`. I notice that you've already opened a PR for InMemoryCatalog and another for HiveCatalog. Thank you so much for the contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org