avantgardnerio commented on PR #23167:
URL: https://github.com/apache/datafusion/pull/23167#issuecomment-4803348695
And as promised, answers to individual questions:
> Why not extend the existing partition_statistics method instead of adding
runtime_row_count?
I think we should do that. What I have was convenient for fall through to
children.
> Feels like a parallel optimization framework. There shouldn't be a
difference between pre-execution and in-flight optimization.
That's exactly the target, and it's why RuntimeRule has the same signature
as PhysicalOptimizerRule now. The two traits exist today only because
physical-plan can't depend on physical-optimizer (the dependency runs the
other way) . End-state: one trait, with rules optionally reading runtime stats
from boundaries in the plan tree they receive. Static rules ignore runtime
state; runtime-aware rules use it. Per-rule migration: any existing
PhysicalOptimizerRule (including JoinSelection) can be made runtime-aware by
adding state inspection. Existing rules don't change until they want to.
> Pipeline breakers for HJ and NLJ?
For HJ this PR inserts one: InsertHashJoinBoundaries wraps each HashJoin
input in a StageBoundaryBuffer. The buffer materializes its input and IS the
pipeline breaker — we don't need a natural one inside the join.
- Same pattern works for NLJ (no natural breaker, just insert one above
each input — ~30-line rule).
- For SMJ either approach works: reuse the existing SortExec (it's already
a breaker — insert the boundary above it for stats) or insert directly above
the SMJ inputs. Either way the infrastructure is the same.
- The contribution is precisely this generalization: any operator can
become adaptive without depending on a natural breaker existing inside it.
> Joins usually don't want to fully know both sides; when does non-shuffle
execution benefit?
- The win case is exactly what the SLT shows: one side is derived
(aggregation, filter, scan-with-projection) whose post-operator size is much
smaller than the planner's Inexact upper bound. Materializing that side is
cheap relative to the bad build-side choice it lets us avoid.
- For sides where materialization is genuinely unsafe (large streaming
inputs), the stream-through fallback (point 5 of the summary) gates: if memory
pressure hits the threshold, the boundary releases without finishing
materialization, the runtime rule sees an unflagged stage and bails, the query
runs exactly as it would today.
- Net: adaptive wins when stats are knowable cheaply; otherwise we're no
worse than the pre-AQE baseline.
> Reuse existing optimizations like JoinSelection.
- Strongly agree. SwapBuildSideIfInverted already calls
HashJoinExec::swap_inputs — the same primitive JoinSelection uses. Same logic,
fresher numbers.
- With the trait unification above, JoinSelection itself could run at
runtime against the post-breaker plan, no re-implementation needed.
> RuntimeOptimizerExec feels hacky — reusing an ExecutionPlan for AQE
plumbing.
- Fair concern. The reason it IS an ExecutionPlan: that's the only point
in the existing pipeline where we can intercept poll-time decisions without
changes to every operator or to the session layer. The RTO sits at the root,
opt-in via an optimizer rule.
- The wrapper-at-root pattern matches other "coordinator" nodes in DF
(e.g. how CoalescePartitionsExec orchestrates downstream). The operator
interface is the natural composition surface.
- Open to alternatives — a SessionContext-level coordinator would also
work but requires plumbing into every execute path. The wrapper was the
surgical "get it done today, show working code" compromise of this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]