zhuqi-lucas commented on issue #22405: URL: https://github.com/apache/datafusion/issues/22405#issuecomment-4545649696
## Status (May 2026) Phase 1 prototype shipped as #22518: - ✅ A/B sampling (measure `partial_ns/row` + `passthrough_ns/row` + ratio) - ✅ Cost crossover decision: `skip ⇔ ratio > passthrough_ns / partial_ns` (derived from the closed-form `cost_keep` vs `cost_skip` comparison, no magic constants) - ✅ ≤ 1 % overhead (10k passthrough sample per partition; default config keeps the operator-wide `elapsed_compute` timer doing the measurement, no extra `Instant::now()` in the hot path) - ✅ Diagnostic gauges exposed via EXPLAIN ANALYZE: `partial_agg_probe_partial_ns_per_row`, `_passthrough_ns_per_row`, `_ratio_per_mille`, `_cost_decision_skip` - ✅ ClickBench partitioned (ARM Neoverse-V2): 10 queries faster (Q19 +1.43×, Q39 +1.30×, Q29 +1.23×, Q18 +1.12×, …), 1 minor regression (Q42 ~15 ms, noise), total **−1.5 %** - ⏳ **Segment-level re-probing is deferred.** Attempted in #22518 but reverted: when the probe re-enters the partial-agg path after a committed skip segment, the operator panics at `multi_group_by/primitive.rs:156` with an out-of-bounds `lhs_row`. Looks like `GroupValues::emit(EmitTo::All)` clears the per-column arrays but the hash→index map retains stale entries from before the emit — fine for the existing one-shot skip path, but breaks any path that goes `partial → skip → partial`. Worth tackling as a follow-up once that reset semantic is sorted out. Will keep this issue open until #22518 merges and we have post-merge benchmark data; the re-probing follow-up should land as a separate PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
