syun64 commented on issue #281:
URL: https://github.com/apache/iceberg-python/issues/281#issuecomment-1905187291

   Hi @Fokko - sounds like you beat me to it 😄 Please let me know if you need 
any additional heavy lifting on #284 . Happy to help as always.
   
   The reason I was curious if there's an opportunity to deduplicate code here, 
is because 
[buildReplacement](https://github.com/apache/iceberg/blob/e32df0ce08086758c44e9174c582638068244073/core/src/main/java/org/apache/iceberg/TableMetadata.java#L672)
 code in Java also takes Iceberg Schema as an input. It then compares the new 
updated schema against the existing schema to use the existing ID if the 
corresponding field name exists, or [assign a new 
ID](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/types/AssignFreshIds.java#L51)
 starting from the next increment from 
[last_column_id](https://github.com/apache/iceberg-python/blob/acfa5bc4e1e0ab167b7d3652438288e9d295f9b6/pyiceberg/table/metadata.py#L153)
 of the table.
   
   On second thought, I'm wondering if it would actually make sense to extend 
functionality of the PyArrow Schema Visitor we are planning to implement for 
https://github.com/apache/iceberg-python/issues/278 and have the schema visitor 
take the last_column_id and the base Schema as the input so that we assign the 
existing field ID if it exists, and assign a new field ID that starts from 
last_column_id. What are your thoughts on this idea?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to