kosiew opened a new pull request, #22771:
URL: https://github.com/apache/datafusion/pull/22771

   ## Which issue does this PR close?
   
   * Part of #22670
   
   ## Rationale for this change
   
   The projection branch of `ScalarSubqueryToJoin` tracked rewrite ownership 
through multiple parallel structures (`all_subqueries`, `alias_to_index`, and 
`rewrite_exprs`). While functionally correct, this made ownership indirect and 
increased the risk of applying compensation expressions to the wrong projection 
slot during future changes.
   
   This refactor makes rewrite ownership explicit by associating rewritten 
expressions and their generated subquery aliases with a dedicated per-slot 
state structure, while preserving existing behavior, join order, alias 
generation, compensation semantics, and projection output names. 
   
   ## What changes are included in this PR?
   
   * Introduce a private `ProjectionRewriteState` struct containing:
   
     * `rewritten_expr`
     * `subquery_aliases`
   * Refactor projection expression extraction to build per-slot rewrite state 
instead of maintaining separate rewrite-expression and ownership containers.
   * Derive alias ownership from the slot state and map aliases back to the 
owning projection slot during join conversion.
   * Rewrite compensation handling for projection expressions to update only 
the owning slot's expression.
   * Extract compensation-expression replacement logic into a reusable helper:
   
     * `apply_compensation_exprs`
   * Update the filter branch to use the new helper for compensation expression 
application.
   * Preserve existing projection name restoration logic when rebuilding the 
final projection. 
   
   ## Are these changes tested?
   
   Yes.
   
   Added SQLLogicTests covering projection rewrite ownership and compensation 
behavior:
   
   * `correlated_scalar_subquery_multiple_projection_slots`
   * `correlated_scalar_subquery_multiple_subqueries_one_projection_slot`
   * `correlated_scalar_subquery_mixed_repeated_and_non_count_projection_slots`
   
   These tests verify that:
   
   * Distinct projection slots receive compensation independently.
   * Multiple scalar subqueries within a single projection expression are all 
compensated correctly.
   * Repeated `COUNT` scalar subqueries are compensated while non-`COUNT` 
aggregates retain their existing `NULL` semantics. 
   
   ## Are there any user-facing changes?
   
   No.
   
   This is a behavior-preserving internal refactor of the optimizer rule. The 
included tests validate that existing SQL semantics are maintained. 
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to