avantgardnerio opened a new pull request, #23094:
URL: https://github.com/apache/datafusion/pull/23094
## Which issue does this PR close?
Implements the proposal in #23093. (Not using `Closes #23093` so the
discussion thread can stay open for the broader API conversation.)
## Rationale for this change
See #23093 for full design rationale. Short version: `Partitioning::Range`
(landed in #22207) covers the *declarative* case where split points are known
at plan time. This adds the symmetric *runtime-discovered* sibling — where the
boundary set is only known once an upstream operator has observed its actual
data range. The partition count stays fixed at plan time so downstream
distribution requirements have a stable answer; only the split point values are
runtime-discovered.
Motivating downstream use case: parallelizing single-partition window
functions (RANGE frames, no PARTITION BY) — see the spike at #23026.
## What changes are included in this PR?
Variant introduction only — no execution slot in this PR.
```rust
pub enum Partitioning {
...
Range(RangePartitioning),
DynamicRange(DynamicRangePartitioning), // <- new
UnknownPartitioning(usize),
}
pub struct DynamicRangePartitioning {
ordering: LexOrdering,
partition_count: usize,
}
```
Behavior mirrors `Range` at every match site:
- `Partitioning::partition_count`, `compatible_with`, `project`,
`PartialEq`, `Display` arms added.
- `DynamicRangePartitioning` has `new`, `ordering`, `partition_count`,
`compatible_with`, `project`, `Display` mirroring `RangePartitioning` (minus
split-point validation, since there are no split points to validate at plan
time).
- `RepartitionExec`'s `repartitioned()`, `try_pushdown_sort()`, and
projection-pushdown sites return `not_impl_err!` for the new variant, same as
they already do for `Range`.
- `RepartitionExec`'s row-routing path was already a catch-all
`not_impl_err!` for non-Hash / non-RoundRobin variants, so no change is needed
there.
- FFI bridges to `UnknownPartitioning(n)`, same path `Range` takes per
#22394.
- Proto serialization returns `not_impl_err!` — proto plumbing for
`DynamicRange` will be added incrementally, mirroring how `Range` landed in
steps (#22207 → #22787).
## Are these changes tested?
Three new tests in `datafusion/physical-expr/src/partitioning.rs::tests`:
- `test_dynamic_range_partitioning_metadata` — construction, `Display`,
`partition_count`, accessors.
- `test_dynamic_range_partitioning_compatible_with` — same ordering and same
partition_count → compatible; different partition_count → not; different sort
options → not; single-partition / single-partition → always compatible; through
the `Partitioning` enum, including cross-variant (DynamicRange vs declared
Range never compatible).
- `test_dynamic_range_partitioning_project_preserves_or_degrades` —
projection preserves the ordering when the key survives; degrades to
`UnknownPartitioning(n)` (preserving partition count) when the key is dropped.
`cargo clippy --all-features --all-targets -- -D warnings --no-deps` clean.
`cargo fmt --all` clean. `cargo test -p datafusion-physical-expr --lib
partitioning::`: 20 pass (3 new).
## Are there any user-facing changes?
- New public types in `datafusion::physical_expr`:
`DynamicRangePartitioning`, `Partitioning::DynamicRange` enum variant.
- No changes to existing variants. Existing match sites on `Partitioning`
may need a new arm — most upstream code already had one for `Range` and can
extend it. The crates updated in this PR (`physical-plan`, `proto`, `ffi`)
cover the in-tree consumers.
- No SQL surface changes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]