c-thiel commented on PR #587: URL: https://github.com/apache/iceberg-rust/pull/587#issuecomment-2334785757
Thanks for your Feedback @liurenjie1024. This isn't really a refactoring of the builder, it's more a complete rewrite. The old builder allowed to create corrupt metadata in various ways. Splitting it up by `TableUpdate` would not be straight forward - many tests also in other modules depend on building Metadata. Creating Metadata now always goes through builder methods - it's a different architecture that requires basic support for all methods from the beginning just to keep tests running. I would currently prefer to keep it as a larger block mainly because: * I don't have much time currently and its going to be more effort * We would need to write auxiliary code to provide non-checked methods so that crate tests don't fail * The total timespan of merging 10 or so PRs is expected to be much larger than putting ~2 full days effort in from a Rust Maintainer and a Java Maintainer to review it as a block. * Patterns are repetitive and can be reviewed together in many cases * A lot of it are tests - the core builder are 1145 lines. Its long - but doable :) We now have a vision of what it could look like in the end. Before putting any more effort in, we should answer the following questions: * Is the overall structure OK or should we head in a different direction? * My first point of the opening statement: Do we re-write our SortOrder and add the schema to PartitionSpec so that we can match on names like Java does or not? * My second point from the opening statement: How do we handle `new_last_column_id` Those points might change the overall design quite a bit and might require a re-write of `SortOrder` first (split to bound and unbound). After we answered those questions, and we still think splitting makes sense, I can try to find time to build stacked-PRs. Maybe just splitting normalization / validation in `table_metadata.rs` from the actual builder would be a leaner option than splitting every single `TableUpdate`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org