andygrove opened a new issue, #4031:
URL: https://github.com/apache/datafusion-comet/issues/4031

   ### Summary
   
   Two tests in `CometExecSuite` are currently skipped with 
`assume(!isSpark40Plus)`:
   
   - `SparkToColumnar eliminate redundant in AQE`
   - `SparkToColumnar override node name for row input`
   
   When the guards are removed and the tests run on Spark 4, both fail with 
`List() had length 0 instead of expected length 1` — i.e. the tests look for 
exactly one `CometSparkToColumnarExec` in the final AQE plan and find zero.
   
   ### Reproducer
   
   Both tests run the same query shape:
   
   ```scala
   withSQLConf(
       SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
       CometConf.COMET_SHUFFLE_MODE.key -> "jvm") {
     val df = spark
       .range(1000)
       .selectExpr(\"id as key\", \"id % 8 as value\")
       .toDF(\"key\", \"value\")
       .groupBy(\"key\")
       .count()
     df.collect()
   
     val planAfter = df.queryExecution.executedPlan
     val adaptivePlan = 
planAfter.asInstanceOf[AdaptiveSparkPlanExec].executedPlan
     val found = adaptivePlan.collect { case c: CometSparkToColumnarExec => c }
     assert(found.length == 1)
   }
   ```
   
   On Spark 3.5 this produces exactly one `CometSparkToColumnarExec`; on Spark 
4 it produces zero.
   
   ### Likely root cause
   
   `RangeExec` behavior or the AQE insertion rule differs between Spark 3.5 and 
4 such that no `CometSparkToColumnarExec` is inserted in the final plan. Needs 
investigation to determine whether:
   
   1. The `CometSparkToColumnarExec` insertion rule should be adjusted for 
Spark 4, or
   2. Spark 4's `RangeExec` already produces columnar output and the test 
assertions are stale, or
   3. AQE is eliminating the wrapper in a new way.
   
   ### How to reproduce
   
   On branch with both `assume(!isSpark40Plus)` lines commented out in 
`CometExecSuite.scala:2392` and `:2482`:
   
   ```sh
   ./mvnw test -Pspark-4.0 -Dtest=none 
-Dsuites='org.apache.comet.exec.CometExecSuite SparkToColumnar'
   ```
   
   Both affected tests fail; other `SparkToColumnar` tests pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to