toutane opened a new issue, #2658:
URL: https://github.com/apache/iceberg-rust/issues/2658

   ### Is your feature request related to a problem or challenge?
   
   ### Background
   
   `compute_identity_cols` in 
`crates/integrations/datafusion/src/table/bucketing.rs` (added in #2298) 
returns `None` which forces the eager scan to declare `UnknownPartitioning` 
whenever a table has more than one historical partition spec.
   
   This is safe but stricter than iceberg-java, which intersects the identity 
fields present across all specs (`Partitioning.groupingKeyType` / 
`commonActiveFieldIds`) and still reports a grouping key on the columns that 
are identity-partitioned in *every* spec.
   
   ### Why it's conservative today
   
   The eager bucketing path hashes each task on the partition-tuple slot that 
matches the table's **default** spec. Under spec evolution, older files carry a 
partition tuple whose slot order does not necessarily align with the default 
spec, and `FileScanTask` does not currently carry its own spec id to 
disambiguate. A per-column intersection was attempted in e0d6add and reverted 
in f25c911 as out of scope for #2298.
   
   ### Describe the solution you'd like
   
   Match iceberg-java: compute the intersection of identity-source fields 
common to every spec and declare `Partitioning::Hash` on those columns, 
resolving each task's partition slot via its own spec id rather than assuming 
the default spec's slot order.
   
   Follow-up to #2298.
   
   ### Willingness to contribute
   
   I would be willing to contribute to this feature with guidance from the 
Iceberg Rust community


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to