toutane opened a new issue, #2658: URL: https://github.com/apache/iceberg-rust/issues/2658
### Is your feature request related to a problem or challenge? ### Background `compute_identity_cols` in `crates/integrations/datafusion/src/table/bucketing.rs` (added in #2298) returns `None` which forces the eager scan to declare `UnknownPartitioning` whenever a table has more than one historical partition spec. This is safe but stricter than iceberg-java, which intersects the identity fields present across all specs (`Partitioning.groupingKeyType` / `commonActiveFieldIds`) and still reports a grouping key on the columns that are identity-partitioned in *every* spec. ### Why it's conservative today The eager bucketing path hashes each task on the partition-tuple slot that matches the table's **default** spec. Under spec evolution, older files carry a partition tuple whose slot order does not necessarily align with the default spec, and `FileScanTask` does not currently carry its own spec id to disambiguate. A per-column intersection was attempted in e0d6add and reverted in f25c911 as out of scope for #2298. ### Describe the solution you'd like Match iceberg-java: compute the intersection of identity-source fields common to every spec and declare `Partitioning::Hash` on those columns, resolving each task's partition slot via its own spec id rather than assuming the default spec's slot order. Follow-up to #2298. ### Willingness to contribute I would be willing to contribute to this feature with guidance from the Iceberg Rust community -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
