HonahX commented on issue #290:
URL: https://github.com/apache/iceberg-python/issues/290#issuecomment-1905147806

   > I noticed that Schema class __eq__ function does not check if the 
schema_ids are equal.
   
   I think this is the intended behavior. We consider two schemas equal if they 
share the same set of fields and identifier fields, as these factors define the 
table structure. In contrast, the schema-id relates more to when the schema was 
added to the table. Say if we want to check if two tables have the same 
structure, we expect `tbl1.schema() == tbl2.schema()` return `True` if both 
tables have the same columns definition, even if two schemas have different 
schema-ids.
   
   Another example: in `update_schema()` we rely on this equality check to 
determine if we need to add the new schema, or we can re-use some previously 
added one:
   
https://github.com/apache/iceberg-python/blob/a56838dc5d9acc5f0e0d70919bfc433c7d0756f1/pyiceberg/table/__init__.py#L1823-L1831
   
   > In test_add_column, it checks if the two schemas are equal using 
schema_id=0 which is wrong, it should be schema_id=1 in L602 since schema 
evolution just happened
   
   You're right. I think for this test, we can remove the `schema_id=0` and 
`identifier_field_ids=[]` in the "expected" schema because they will be 
initialized to these values in default. We can add another assertion to check 
the schema-id. Does this fix sound reasonable to you?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to