andygrove opened a new issue, #4042:
URL: https://github.com/apache/datafusion-comet/issues/4042

   ## Summary
   
   With `native_datafusion`, a scalar subquery pushed down as a data filter on 
`CometNativeScanExec` does not produce a `ReusedSubqueryExec` the way Spark's 
vectorized reader (and `CometScanExec`) do. The pushed subquery is a plain 
`Subquery`, so subsequent references to the same subquery do not share the 
result.
   
   ## Failing Test
   
   `SubquerySuite`: "SPARK-43402: FileSourceScanExec supports push down data 
filter with scalar subquery"
   
   ## Reproduction
   
   Updating the test's plan-match to include `CometNativeScanExec`:
   
   ```scala
   val dataSourceScanExec = collect(df.queryExecution.executedPlan) {
     case f: FileSourceScanLike => f
     case c: CometScanExec => c
     case n: CometNativeScanExec => n
   }
   ```
   
   makes the first assertion (`dataSourceScanExec.size == 1`) pass. The next 
assertion still fails:
   
   ```
   was not instance of org.apache.spark.sql.execution.ReusedSubqueryExec 
(SubquerySuite.scala:2716)
   ```
   
   with the plan showing a plain `Subquery` rather than `ReusedSubqueryExec`:
   
   ```
   Subquery subquery#295, [id=#166]
   +- AdaptiveSparkPlan isFinalPlan=true
      +- == Final Plan ==
         ResultQueryStage 2
         +- CometNativeColumnarToRow
            +- CometHashAggregate [min#303], Final, [min(c2#297)]
               +- ShuffleQueryStage 0
                  +- CometExchange SinglePartition, ...
                     +- CometHashAggregate [c2#297], Partial, 
[partial_min(c2#297)]
                        +- CometNativeScan parquet ...
   ```
   
   The `dataFilters` on the `CometNativeScanExec` carry the subquery reference 
but aren't wired into the reused-subquery machinery.
   
   ## Related
   
   Split from #3315 while triaging the tests previously ignored under #3321.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to