syun64 commented on issue #281: URL: https://github.com/apache/iceberg-python/issues/281#issuecomment-1905187291
Hi @Fokko - sounds like you beat me to it 😄 Please let me know if you need any additional heavy lifting on #284 . Happy to help as always. The reason I was curious if there's an opportunity to deduplicate code here, is because [buildReplacement](https://github.com/apache/iceberg/blob/e32df0ce08086758c44e9174c582638068244073/core/src/main/java/org/apache/iceberg/TableMetadata.java#L672) code in Java also takes Iceberg Schema as an input. It then compares the new updated schema against the existing schema to use the existing ID if the corresponding field name exists, or [assign a new ID](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/types/AssignFreshIds.java#L51) starting from the next increment from [last_column_id](https://github.com/apache/iceberg-python/blob/acfa5bc4e1e0ab167b7d3652438288e9d295f9b6/pyiceberg/table/metadata.py#L153) of the table. On second thought, I'm wondering if it would actually make sense to extend functionality of the PyArrow Schema Visitor we are planning to implement for https://github.com/apache/iceberg-python/issues/278 and have the schema visitor take the last_column_id and the base Schema as the input so that we assign the existing field ID if it exists, and assign a new field ID that starts from last_column_id. What are your thoughts on this idea? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org