adriangb opened a new pull request, #22298:
URL: https://github.com/apache/datafusion/pull/22298

   ## Which issue does this PR close?
   
   Supersedes #20837. That PR's branch was rebased onto current `main`; GitHub 
does
   not allow reopening a closed PR once its branch has been force-pushed, so 
this is
   a fresh PR carrying the same work, rebased and re-benchmarked.
   
   ## Rationale for this change
   
   `Optimizer::optimize` runs every logical optimizer rule over the whole plan,
   repeatedly, until a fixed point. Benchmarking the optimizer in isolation (SQL
   parsing and analysis excluded) surfaced two avoidable costs:
   
   1. `rewrite_with_subqueries` calls `map_subqueries` at every plan node, 
walking
      every expression tree via ownership-based `transform_down` — even for 
plans
      with no subquery expressions at all, which is the common case.
   2. The ownership-based `TreeNode::rewrite` traversal performs an
      `Arc::unwrap_or_clone` + `Arc::new` cycle at every child node, 
re-allocating
      the `Arc` spine on every pass even when nothing changes.
   
   ## What changes are included in this PR?
   
   Three optimizations:
   
   1. **`map_subqueries` short-circuit** — skip the expression-tree walk when a 
node
      has no subquery expressions.
   2. **`plan_has_subqueries` per-pass check** — when the whole plan has no
      subqueries, bypass `rewrite_with_subqueries` entirely and use the cheaper
      in-place traversal.
   3. **`rewrite_plan_in_place` with `Arc::make_mut`** — a new
      `LogicalPlan::map_children_mut` mutates `Arc<LogicalPlan>` children in 
place
      (copy-on-write; free when the refcount is 1, which it always is in the
      optimizer), avoiding the `Arc::unwrap_or_clone` + `Arc::new` cycle. The
      owned-plan rule API is bridged with `std::mem::take`, which is
      allocation-free because `LogicalPlan::default()` is an `EmptyRelation` 
that
      shares the process-wide static empty schema.
   
   Also adds optimizer-only benchmarks to `sql_planner.rs` that isolate 
optimizer
   cost from SQL parsing/analysis.
   
   ### Benchmark results (optimizer-only, criterion, this PR vs `main`)
   
   | Benchmark | main | this PR | Change |
   |---|---|---|---|
   | optimizer_select_one_from_700 | 200 µs | 202 µs | +1% (noise) |
   | optimizer_select_all_from_1000 | 4.71 ms | 4.13 ms | **−12%** |
   | optimizer_join_chain_4 | 136 µs | 103 µs | **−24%** |
   | optimizer_join_chain_8 | 445 µs | 363 µs | **−18%** |
   | optimizer_wide_filter_200 | 4.91 ms | 3.47 ms | **−29%** |
   | optimizer_wide_aggregate_100 | 2.10 ms | 1.50 ms | **−29%** |
   | optimizer_correlated_exists | 187 µs | 185 µs | −1% (noise) |
   | optimizer_join_4_with_agg_filter | 384 µs | 276 µs | **−28%** |
   | optimizer_tpch_all | 12.25 ms | 9.14 ms | **−25%** |
   | optimizer_tpcds_all | 220 ms | 170 ms | **−23%** |
   
   Measured A/B on a single machine: for the baseline run the two optimizer rule
   files were reverted to `main` (keeping the new benchmark code), then 
restored, so
   only the optimization itself is being measured. The `optimizer_tpcds_all` 
number
   was confirmed across multiple back-to-back runs after an initial reading was
   distorted by machine interference.
   
   ### Possible future work (not in this PR)
   
   The `mem::take` bridge in `rewrite_plan_in_place` is allocation-free, but it 
still
   extracts an owned plan for every `(node, rule)` pair up front — before the 
rule
   decides whether it will transform — and the overwhelming majority of those 
pairs
   are no-ops. A dynamic "lazy handle" rule API (the rule receives a handle that
   derefs to `&LogicalPlan` for free and only pays the copy-on-write / move 
when it
   calls `make_mut` / `replace`) would let the framework skip the extraction for
   no-op rule invocations entirely. That is a breaking change to the public
   `OptimizerRule` trait and is out of scope here; it is being prototyped 
separately.
   
   ## Are these changes tested?
   
   Yes. The existing optimizer test suite (713 tests across 
`datafusion-optimizer`)
   passes unchanged. The optimizations are behavior-preserving: the in-place
   traversal produces the same plans as the ownership-based traversal, and the
   subquery short-circuit only skips work that is provably a no-op.
   
   ## Are there any user-facing changes?
   
   No breaking API changes. `LogicalPlan::map_children_mut` is a new public 
method —
   purely additive. Optimization is faster; output plans are unchanged.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to