c-thiel commented on issue #694: URL: https://github.com/apache/iceberg-rust/issues/694#issuecomment-2480849988
> Unbound (not bound to a schema) and schemaless are the same, so I find them confusing. They are not quite - a previously bound schema has a required `field_id` (reference to `PartitionField`), while unbound has an optional `field_id` (`Option<i32>`) as it references `UnboundPartitionField`. `TableMetadata.partition_specs` requires `field_id`, so we shouldn't use `UnboundPartitionSpec` there. >> A PartitionSpec is only valid for the schema is had been build against, and we should not imply otherwise. > This is not entirely correct. For example, when a field is being promoted in a valid way, then it is still valid to bind against the same source-id. True, that was too strict. What I should have expressed it that it is not guaranteed to be compatible with the current schema, which is why Java uses `bindUnchecked()`. I think with the current implementation we are quite close to what you propose - binding a `SchemalessPartitionSpec` or `UnboundPartitionSpec` to a new schema is easily possible by using `spec.to_unbound().bind(new_schema)`. Currently this binding can fail. What is missing is the notion that you described with > When the field is not in the current spec anymore, it is also unlikely that you would filter on that field, therefore that field from the partition-spec could also be ignored This would require a more permissive `bind` method, or an additional flag, that would ignore unknown fields, similar to what you try to incorporate into Java with `Any`. So re-binding is already possible today. The `BoundPartitionSpec` serves two additional purposes - it is a way to compute the binding and store the result typesafe. This is useful for 1. Pre-computing the bind, as we currently do for `default_spec` against the `current_schema` - which is the most important combination. 2. Have the means to get the source field names for a partition specs. This is useful when a spec is transferred from one context to another. In Java this is used for `addPartitionSpec` which eventually runs this code: https://github.com/apache/iceberg/blob/acd7cc1126b192ccb53ad8198bda37e983aa4c6c/core/src/main/java/org/apache/iceberg/TableMetadata.java#L768. Note that currently we don't have this semantic in the builder, which in java performs name-mapping to potentially change the `source_id`. Instead we pass an `UnboundSpec` which trusts the provides `source_id` and not the field name. Summing it up, my approach would be to keep all three types - I simply don't see a good way around them. However, we should add a more permissive bind method that incorporates better the semantic of binding to an incompatible schema. This would be an alternative to the `bindUnchecked()` in Java. Maybe we can just follow the PR you are working on in Java once that's ready. This method would not necessarily be able to determine a type for each field in a `spec`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org