adriangb opened a new pull request, #22298:
URL: https://github.com/apache/datafusion/pull/22298
## Which issue does this PR close?
Supersedes #20837. That PR's branch was rebased onto current `main`; GitHub
does
not allow reopening a closed PR once its branch has been force-pushed, so
this is
a fresh PR carrying the same work, rebased and re-benchmarked.
## Rationale for this change
`Optimizer::optimize` runs every logical optimizer rule over the whole plan,
repeatedly, until a fixed point. Benchmarking the optimizer in isolation (SQL
parsing and analysis excluded) surfaced two avoidable costs:
1. `rewrite_with_subqueries` calls `map_subqueries` at every plan node,
walking
every expression tree via ownership-based `transform_down` — even for
plans
with no subquery expressions at all, which is the common case.
2. The ownership-based `TreeNode::rewrite` traversal performs an
`Arc::unwrap_or_clone` + `Arc::new` cycle at every child node,
re-allocating
the `Arc` spine on every pass even when nothing changes.
## What changes are included in this PR?
Three optimizations:
1. **`map_subqueries` short-circuit** — skip the expression-tree walk when a
node
has no subquery expressions.
2. **`plan_has_subqueries` per-pass check** — when the whole plan has no
subqueries, bypass `rewrite_with_subqueries` entirely and use the cheaper
in-place traversal.
3. **`rewrite_plan_in_place` with `Arc::make_mut`** — a new
`LogicalPlan::map_children_mut` mutates `Arc<LogicalPlan>` children in
place
(copy-on-write; free when the refcount is 1, which it always is in the
optimizer), avoiding the `Arc::unwrap_or_clone` + `Arc::new` cycle. The
owned-plan rule API is bridged with `std::mem::take`, which is
allocation-free because `LogicalPlan::default()` is an `EmptyRelation`
that
shares the process-wide static empty schema.
Also adds optimizer-only benchmarks to `sql_planner.rs` that isolate
optimizer
cost from SQL parsing/analysis.
### Benchmark results (optimizer-only, criterion, this PR vs `main`)
| Benchmark | main | this PR | Change |
|---|---|---|---|
| optimizer_select_one_from_700 | 200 µs | 202 µs | +1% (noise) |
| optimizer_select_all_from_1000 | 4.71 ms | 4.13 ms | **−12%** |
| optimizer_join_chain_4 | 136 µs | 103 µs | **−24%** |
| optimizer_join_chain_8 | 445 µs | 363 µs | **−18%** |
| optimizer_wide_filter_200 | 4.91 ms | 3.47 ms | **−29%** |
| optimizer_wide_aggregate_100 | 2.10 ms | 1.50 ms | **−29%** |
| optimizer_correlated_exists | 187 µs | 185 µs | −1% (noise) |
| optimizer_join_4_with_agg_filter | 384 µs | 276 µs | **−28%** |
| optimizer_tpch_all | 12.25 ms | 9.14 ms | **−25%** |
| optimizer_tpcds_all | 220 ms | 170 ms | **−23%** |
Measured A/B on a single machine: for the baseline run the two optimizer rule
files were reverted to `main` (keeping the new benchmark code), then
restored, so
only the optimization itself is being measured. The `optimizer_tpcds_all`
number
was confirmed across multiple back-to-back runs after an initial reading was
distorted by machine interference.
### Possible future work (not in this PR)
The `mem::take` bridge in `rewrite_plan_in_place` is allocation-free, but it
still
extracts an owned plan for every `(node, rule)` pair up front — before the
rule
decides whether it will transform — and the overwhelming majority of those
pairs
are no-ops. A dynamic "lazy handle" rule API (the rule receives a handle that
derefs to `&LogicalPlan` for free and only pays the copy-on-write / move
when it
calls `make_mut` / `replace`) would let the framework skip the extraction for
no-op rule invocations entirely. That is a breaking change to the public
`OptimizerRule` trait and is out of scope here; it is being prototyped
separately.
## Are these changes tested?
Yes. The existing optimizer test suite (713 tests across
`datafusion-optimizer`)
passes unchanged. The optimizations are behavior-preserving: the in-place
traversal produces the same plans as the ownership-based traversal, and the
subquery short-circuit only skips work that is provably a no-op.
## Are there any user-facing changes?
No breaking API changes. `LogicalPlan::map_children_mut` is a new public
method —
purely additive. Optimization is faster; output plans are unchanged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]