kosiew opened a new pull request, #22771:
URL: https://github.com/apache/datafusion/pull/22771
## Which issue does this PR close?
* Part of #22670
## Rationale for this change
The projection branch of `ScalarSubqueryToJoin` tracked rewrite ownership
through multiple parallel structures (`all_subqueries`, `alias_to_index`, and
`rewrite_exprs`). While functionally correct, this made ownership indirect and
increased the risk of applying compensation expressions to the wrong projection
slot during future changes.
This refactor makes rewrite ownership explicit by associating rewritten
expressions and their generated subquery aliases with a dedicated per-slot
state structure, while preserving existing behavior, join order, alias
generation, compensation semantics, and projection output names.
## What changes are included in this PR?
* Introduce a private `ProjectionRewriteState` struct containing:
* `rewritten_expr`
* `subquery_aliases`
* Refactor projection expression extraction to build per-slot rewrite state
instead of maintaining separate rewrite-expression and ownership containers.
* Derive alias ownership from the slot state and map aliases back to the
owning projection slot during join conversion.
* Rewrite compensation handling for projection expressions to update only
the owning slot's expression.
* Extract compensation-expression replacement logic into a reusable helper:
* `apply_compensation_exprs`
* Update the filter branch to use the new helper for compensation expression
application.
* Preserve existing projection name restoration logic when rebuilding the
final projection.
## Are these changes tested?
Yes.
Added SQLLogicTests covering projection rewrite ownership and compensation
behavior:
* `correlated_scalar_subquery_multiple_projection_slots`
* `correlated_scalar_subquery_multiple_subqueries_one_projection_slot`
* `correlated_scalar_subquery_mixed_repeated_and_non_count_projection_slots`
These tests verify that:
* Distinct projection slots receive compensation independently.
* Multiple scalar subqueries within a single projection expression are all
compensated correctly.
* Repeated `COUNT` scalar subqueries are compensated while non-`COUNT`
aggregates retain their existing `NULL` semantics.
## Are there any user-facing changes?
No.
This is a behavior-preserving internal refactor of the optimizer rule. The
included tests validate that existing SQL semantics are maintained.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]