c-thiel opened a new pull request, #645:
URL: https://github.com/apache/iceberg-rust/pull/645

   Fixes https://github.com/apache/iceberg-rust/issues/550
   
   This PR is a result of the issue mentioned above, a [Slack Discussion]( 
https://apache-iceberg.slack.com/archives/C03LG1D563F/p1725878870301909) and 
the preparation for the new MetadataBuilder 
https://github.com/apache/iceberg-rust/pull/587.
   
   Lets first establish, that `PartitionSpec` should have a bound schema 
because:
   * We need to bind the schema to to `PartitonSpec` to allow internal 
partition id handling (see @rdblue comment in slack).
   * It allows for consistent handling of schema related operations, such as 
determining the partition type without returning a `Result`.
   * A PartitionSpec is only valid for the schema is had been build against, 
and we should not imply otherwise.
   
   If we agree on this, then we have a small Problem with `TableMetadata`: It 
contains historic PartitionSpecs that cannot be bound against the 
`current_schema`. As a result, we need a type that has the same properties as 
`PartitionSpec` but is not bound to a schema. I thought we could use 
`UnboundPartitionSpec` for this at first, but it serves a different purpose and 
would make the very important `field_id` Optional.
   
   To solve this, I introduce a `SchemalessPartitionSpec`. Common functions 
between `PartitionSpec` and `SchemalessPartitionSpec` are maid available via a 
common trait. While the trait is public and as such allows to develop functions 
that take either variant as an input, I am exposing the most important 
attributes (i.e. `fields`) still directly, so that the trait typically doesn't 
need to be imported.
   
   For reference, Java solves this by force-binding potentially non-compatible 
schemas to a `PartitionSpec`
   * `TableMetadataBuilder`: 
https://github.com/apache/iceberg/blob/5a2c1c9c8dd8d61e08bbad713c24e6b726eee323/core/src/main/java/org/apache/iceberg/TableMetadata.java#L734
   * `TableMetadataParser`: 
https://github.com/apache/iceberg/blob/5a2c1c9c8dd8d61e08bbad713c24e6b726eee323/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L393
   
   Happy to hear your opinions! This really is a bit of a fiddly topic. I don't 
like to introduce a new type, on the other hand using the API feels more 
natural to me now. This can also be seen in the tests, where we had a 
significant number of places where we passed `PartitionSpec` alongside its 
`Schema` around, just to call `partition_type` eventually.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to