HonahX commented on issue #290:
URL: https://github.com/apache/iceberg-python/issues/290#issuecomment-1903304501

   Hi @kevinjqliu. In Pyiceberg, the `update_schema()...commit()` increments 
the schema id:
   
https://github.com/apache/iceberg-python/blob/a56838dc5d9acc5f0e0d70919bfc433c7d0756f1/pyiceberg/table/__init__.py#L1871
   I think what you observed in `test_base.py` is because the `_commit_table` 
implementation in in-memory catalog uses `new_table_metadata` instead of 
[`update_table_metadata`](https://github.com/apache/iceberg-python/blob/a56838dc5d9acc5f0e0d70919bfc433c7d0756f1/pyiceberg/table/__init__.py#L612-L628)
 to commit changes. 
   
   `new_table_metadata` is normally used when we want to create a new table, so 
it will re-create a new schema using the given one, assigning new field ids and 
reset `schema-id` to default (0). The `update_table_metadata`, in contrast, 
will update the metadata using the schema generated by 
`update_schema()...commit()` which has its schema-id incremented. 
   
   For example, this test shows that the `schema-id` is incremented if we use 
GlueCatalog, which has the formal implementation of `_commit_table`:
   
https://github.com/apache/iceberg-python/blob/a56838dc5d9acc5f0e0d70919bfc433c7d0756f1/tests/catalog/test_glue.py#L526-L555
   
   I think the `_commit_table` in the InMemoryCatalog was implemented when we 
did not have a way to update the table metadata. It was a workaround to write 
basic tests for the transaction API and schema evolution. We should update the 
implementation to use the `update_table_metadata`. 
   
   I notice that you've already opened a PR for InMemoryCatalog and another for 
HiveCatalog. Thank you so much for the contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to