gene-bordegaray opened a new pull request, #22207:
URL: https://github.com/apache/datafusion/pull/22207
## Which issue does this PR close?
- First mechanical PR for `ExprPartitioning` as described in thread: #21992.
## Rationale for this change
DataFusion currently cannot represent some partitioning schemes truthfully.
For example, range-partitioned data currently advertises itself as
`Partitioning::Hash` only to avoid repartitioning, which makes later optimizer
decisions brittle.
This PR introduces expression-based physical partitioning metadata so
sources can eventually describe partition membership with predicates. This
intentionally leaves optimizer and execution semantics unimplemented for
follow-up PRs and to plan the shape of the partitioning API carefully.
## What changes are included in this PR?
- Adds `Partitioning::Expr(ExprPartitioning)` to the physical partitioning
enum.
- Adds `ExprPartitioning`, representing one partition predicate expression
per output partition.
- Documents the contract: each emitted row must match exactly one partition
expression and be emitted by that partition. This is expected to be upheld by
the source declaring this partitioning for correct results.
- Adds conservative projection behavior:
- preserve `ExprPartitioning` only when all partition expressions can be
remapped
- otherwise degrade to `UnknownPartitioning`
- Adds `not_impl_err!` at call-sites where expression partitioning semantics
are not implemented yet.
- Adds proto serialization/deserialization.
## Are these changes tested?
Yes.
## Are there any user-facing changes?
Yes, additive only. This adds a public physical partitioning variant and
public type:
- `Partitioning::Expr`
- `ExprPartitioning`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]