avantgardnerio opened a new pull request, #23094:
URL: https://github.com/apache/datafusion/pull/23094

   ## Which issue does this PR close?
   
   Implements the proposal in #23093. (Not using `Closes #23093` so the 
discussion thread can stay open for the broader API conversation.)
   
   ## Rationale for this change
   
   See #23093 for full design rationale. Short version: `Partitioning::Range` 
(landed in #22207) covers the *declarative* case where split points are known 
at plan time. This adds the symmetric *runtime-discovered* sibling — where the 
boundary set is only known once an upstream operator has observed its actual 
data range. The partition count stays fixed at plan time so downstream 
distribution requirements have a stable answer; only the split point values are 
runtime-discovered.
   
   Motivating downstream use case: parallelizing single-partition window 
functions (RANGE frames, no PARTITION BY) — see the spike at #23026.
   
   ## What changes are included in this PR?
   
   Variant introduction only — no execution slot in this PR.
   
   ```rust
   pub enum Partitioning {
       ...
       Range(RangePartitioning),
       DynamicRange(DynamicRangePartitioning),  // <- new
       UnknownPartitioning(usize),
   }
   
   pub struct DynamicRangePartitioning {
       ordering: LexOrdering,
       partition_count: usize,
   }
   ```
   
   Behavior mirrors `Range` at every match site:
   
   - `Partitioning::partition_count`, `compatible_with`, `project`, 
`PartialEq`, `Display` arms added.
   - `DynamicRangePartitioning` has `new`, `ordering`, `partition_count`, 
`compatible_with`, `project`, `Display` mirroring `RangePartitioning` (minus 
split-point validation, since there are no split points to validate at plan 
time).
   - `RepartitionExec`'s `repartitioned()`, `try_pushdown_sort()`, and 
projection-pushdown sites return `not_impl_err!` for the new variant, same as 
they already do for `Range`.
   - `RepartitionExec`'s row-routing path was already a catch-all 
`not_impl_err!` for non-Hash / non-RoundRobin variants, so no change is needed 
there.
   - FFI bridges to `UnknownPartitioning(n)`, same path `Range` takes per 
#22394.
   - Proto serialization returns `not_impl_err!` — proto plumbing for 
`DynamicRange` will be added incrementally, mirroring how `Range` landed in 
steps (#22207 → #22787).
   
   ## Are these changes tested?
   
   Three new tests in `datafusion/physical-expr/src/partitioning.rs::tests`:
   
   - `test_dynamic_range_partitioning_metadata` — construction, `Display`, 
`partition_count`, accessors.
   - `test_dynamic_range_partitioning_compatible_with` — same ordering and same 
partition_count → compatible; different partition_count → not; different sort 
options → not; single-partition / single-partition → always compatible; through 
the `Partitioning` enum, including cross-variant (DynamicRange vs declared 
Range never compatible).
   - `test_dynamic_range_partitioning_project_preserves_or_degrades` — 
projection preserves the ordering when the key survives; degrades to 
`UnknownPartitioning(n)` (preserving partition count) when the key is dropped.
   
   `cargo clippy --all-features --all-targets -- -D warnings --no-deps` clean. 
`cargo fmt --all` clean. `cargo test -p datafusion-physical-expr --lib 
partitioning::`: 20 pass (3 new).
   
   ## Are there any user-facing changes?
   
   - New public types in `datafusion::physical_expr`: 
`DynamicRangePartitioning`, `Partitioning::DynamicRange` enum variant.
   - No changes to existing variants. Existing match sites on `Partitioning` 
may need a new arm — most upstream code already had one for `Range` and can 
extend it. The crates updated in this PR (`physical-plan`, `proto`, `ffi`) 
cover the in-tree consumers.
   - No SQL surface changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to