kevinjqliu opened a new issue, #290:
URL: https://github.com/apache/iceberg-python/issues/290

   ### Apache Iceberg version
   
   0.5.0 (latest release)
   
   ### Please describe the bug 🐞
   
   When updating the schema of an iceberg table (such as adding a column), the 
`schema_id` should be incremented.
   
   From the [Iceberg spec](https://iceberg.apache.org/spec/#schema-evolution) 
   ```
   Evolution applies changes to the table’s current schema to produce a new 
schema that is identified by a unique schema ID, is added to the table’s list 
of schemas, and is set as the table’s current schema.
   ```
   
   From the Java unit test 
[`TestTableMetadata.java`](https://github.com/apache/iceberg/blob/e32df0ce08086758c44e9174c582638068244073/core/src/test/java/org/apache/iceberg/TestTableMetadata.java#L1497-L1527)
   In particular, the newly created table schema has an id of `0` or 
`TableMetadata.INITIAL_SCHEMA_ID` (L1503)
   The evolved schema after calling `updateSchema` updated the table schema id 
to `1` (L1520)
   
   In comparison, from the Python unit test 
[`test_base.py`](https://github.com/apache/iceberg-python/blob/main/tests/catalog/test_base.py#L592-L618)
   The original table schema id is `0`, but even after calling 
`update_schema()...commit()`, the schema id remains `0` (L602 & L616)
   
   Possible solution:
   In Java, the `schema_id` is incremented during schema evolution. 
([example1](https://github.com/apache/iceberg/blob/e32df0ce08086758c44e9174c582638068244073/core/src/main/java/org/apache/iceberg/SchemaUpdate.java#L183),
 
[example2](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableMetadata.java#L684))
   
   In Python, this is done using the `assign_fresh_schema_ids` function 
([example1](https://github.com/apache/iceberg-python/blob/94d7821cbc6b31b791e18d4f91c0991684616076/pyiceberg/table/__init__.py#L1517),
 
[example2](https://github.com/apache/iceberg-python/blob/94d7821cbc6b31b791e18d4f91c0991684616076/pyiceberg/table/metadata.py#L407))
   However, this function does not increment the schema id. 
([source](https://github.com/apache/iceberg-python/blame/94d7821cbc6b31b791e18d4f91c0991684616076/pyiceberg/schema.py#L1234-L1237))
   Note, the `_get_and_increment` function is used to increment the field id. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to