neilconway opened a new pull request, #21352:
URL: https://github.com/apache/datafusion/pull/21352

   ## Which issue does this PR close?
   
   - Closes #21199.
   
   ## Rationale for this change
   
   This PR adds support for LEFT join semantics for lateral joins. This is a 
bit tricky because of how it interacts with compensation for the "count bug". 
This might be easiest to illustrate with an example; consider this query (Q1):
   
   ```sql
     SELECT t1.id, sub.cnt
     FROM t1 LEFT JOIN LATERAL (
         SELECT count(*) AS cnt FROM t2 WHERE t2.t1_id = t1.id
     ) sub ON sub.cnt > 0
     ORDER BY t1.id;
   ```
   
   The initial decorrelation (Q2) is
   
   ```sql
     SELECT t1.id, sub.cnt
     FROM t1 LEFT JOIN (
         SELECT count(*) AS cnt, t2.t1_id, TRUE AS __always_true
         FROM t2 GROUP BY t2.t1_id
     ) sub ON t1.id = sub.t1_id
     ```
   
   Ignoring the user's original `ON` clause for now. This initial query is 
wrong, because `t1` rows that don't have a match in `t2` will get all-`NULL` 
values, not `0` for `count(*)` of an empty set. This is the "count bug", and we 
compensate for that by checking for rows when `__always_true` is `NULL`, and 
replacing the agg value with the default for that agg (Q3):
   
   ```sql
     SELECT t1.id,
            CASE WHEN sub.__always_true IS NULL THEN 0
                 ELSE sub.cnt END AS cnt
     FROM (  /* Q2 */  )
   ```
   
   Now we just need to handle the user's original `ON` clause. We can't add 
this to the rewritten `ON` clause in Q1, because we *don't* want the count-bug 
compensation to fire. But we also can't just add it to the `WHERE` clause, 
because we need left join semantics. So we can instead wrap another `CASE` that 
re-checks the `ON` condition and substitutes `NULL` for every right-side column:
   
   ```sql
     SELECT t1.id,
            CASE WHEN (cnt > 0) IS NOT TRUE THEN NULL
                 ELSE cnt END AS cnt
     FROM (  /* Q3 */  )
     ```
   
   ## What changes are included in this PR?
   
   * Implement lateral left join rewrite as described above
   * Update expected tests and add more test cases
   * Update documentation
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   Support for a new feature.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to