gene-bordegaray opened a new pull request, #22207:
URL: https://github.com/apache/datafusion/pull/22207

    ## Which issue does this PR close?
   - First mechanical PR for `ExprPartitioning` as described in thread: #21992.
   
   ## Rationale for this change
   
   DataFusion currently cannot represent some partitioning schemes truthfully. 
For example, range-partitioned data currently advertises itself as 
`Partitioning::Hash` only to avoid repartitioning, which makes later optimizer 
decisions brittle.
   
   This PR introduces expression-based physical partitioning metadata so 
sources can eventually describe partition membership with predicates. This 
intentionally leaves optimizer and execution semantics unimplemented for 
follow-up PRs and to plan the shape of the partitioning API carefully.
   
   ## What changes are included in this PR?
   
   - Adds `Partitioning::Expr(ExprPartitioning)` to the physical partitioning 
enum.
   - Adds `ExprPartitioning`, representing one partition predicate expression 
per output partition.
   - Documents the contract: each emitted row must match exactly one partition 
expression and be emitted by that partition. This is expected to be upheld by 
the source declaring this partitioning for correct results.
   - Adds conservative projection behavior:
       - preserve `ExprPartitioning` only when all partition expressions can be 
remapped
       - otherwise degrade to `UnknownPartitioning`
   - Adds `not_impl_err!` at call-sites where expression partitioning semantics 
are not implemented yet.
   - Adds proto serialization/deserialization.
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   Yes, additive only. This adds a public physical partitioning variant and 
public type:
   - `Partitioning::Expr`
   - `ExprPartitioning`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to