Shekharrajak opened a new issue, #3881:
URL: https://github.com/apache/datafusion-comet/issues/3881

   ### What is the problem the feature request solves?
   
   Comet does not support ExistenceJoin, causing incorrect results for 
correlated IN subqueries combined with OR on Spark 4.0. Adding native 
ExistenceJoin support would allow Comet to handle these plans end-to-end and 
eliminate the mixed Spark/Comet execution that produces wrong results.
   
   The following query returns incorrect results when Comet is enabled on Spark 
4.0:
   
   ```
   CREATE TEMPORARY VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
   CREATE TEMPORARY VIEW t2(c1, c2) AS VALUES (0, 2), (0, 3);
   
   SELECT * FROM t1 WHERE
     c1 IN (SELECT count(*) + 1 FROM t2 WHERE t2.c1 = t1.c1) OR
     c2 IN (SELECT count(*) - 1 FROM t2 WHERE t2.c1 = t1.c1);
   ```
   
   Expected: (0, 1) and (1, 2) (both rows match via different OR branches) 
Actual with Comet: (1, 2) only (row (0, 1) is dropped)
   
   The same query with a single IN (no OR) produces correct results.
   
   
   
   ### Describe the potential solution
   
   _No response_
   
   ### Additional context
   
   This only reproduces on Spark 4.0 because:
   
   in-count-bug.sql test only exists in Spark 4.0
   Spark 4.0 uses a new decorrelation path 
(decorrelateInnerQueryEnabledForExistsIn = true by default) that produces 
DomainJoin + ExistenceJoin plan structures
   Spark 3.5 uses the old decorrelation which has the "count bug" (different 
expected behavior)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to