andygrove opened a new issue, #4031:
URL: https://github.com/apache/datafusion-comet/issues/4031
### Summary
Two tests in `CometExecSuite` are currently skipped with
`assume(!isSpark40Plus)`:
- `SparkToColumnar eliminate redundant in AQE`
- `SparkToColumnar override node name for row input`
When the guards are removed and the tests run on Spark 4, both fail with
`List() had length 0 instead of expected length 1` — i.e. the tests look for
exactly one `CometSparkToColumnarExec` in the final AQE plan and find zero.
### Reproducer
Both tests run the same query shape:
```scala
withSQLConf(
SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
CometConf.COMET_SHUFFLE_MODE.key -> "jvm") {
val df = spark
.range(1000)
.selectExpr(\"id as key\", \"id % 8 as value\")
.toDF(\"key\", \"value\")
.groupBy(\"key\")
.count()
df.collect()
val planAfter = df.queryExecution.executedPlan
val adaptivePlan =
planAfter.asInstanceOf[AdaptiveSparkPlanExec].executedPlan
val found = adaptivePlan.collect { case c: CometSparkToColumnarExec => c }
assert(found.length == 1)
}
```
On Spark 3.5 this produces exactly one `CometSparkToColumnarExec`; on Spark
4 it produces zero.
### Likely root cause
`RangeExec` behavior or the AQE insertion rule differs between Spark 3.5 and
4 such that no `CometSparkToColumnarExec` is inserted in the final plan. Needs
investigation to determine whether:
1. The `CometSparkToColumnarExec` insertion rule should be adjusted for
Spark 4, or
2. Spark 4's `RangeExec` already produces columnar output and the test
assertions are stale, or
3. AQE is eliminating the wrapper in a new way.
### How to reproduce
On branch with both `assume(!isSpark40Plus)` lines commented out in
`CometExecSuite.scala:2392` and `:2482`:
```sh
./mvnw test -Pspark-4.0 -Dtest=none
-Dsuites='org.apache.comet.exec.CometExecSuite SparkToColumnar'
```
Both affected tests fail; other `SparkToColumnar` tests pass.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]