andygrove opened a new issue, #4191:
URL: https://github.com/apache/datafusion-comet/issues/4191
Sub-issue of #4098.
## Description
Three tests in `DataFrameSetOperationsSuite` (Spark 4.1) fail under Comet:
- \`SPARK-52921: union partitioning - reused shuffle\`
- \`SPARK-52921: union partitioning - semantic equality\`
- \`SPARK-52921: union partitioning - range partitioning\`
These tests inspect the executed plan with collectors like:
\`\`\`scala
val unionExec = union.queryExecution.executedPlan.collect {
case u: UnionExec => u
}
assert(unionExec.size == 1)
\`\`\`
Comet replaces \`UnionExec\` with \`CometUnionExec\` (which extends
\`CometExec\`, **not** \`UnionExec\`) and \`ShuffleExchangeExec\` with a Comet
equivalent. The collectors find zero matches, so \`unionExec.size == 0\` and
the assertions fail.
## Root cause
Test-side incompatibility with Comet's operator-replacement strategy. No
production code change needed — the fix is to patch the test matchers in
\`dev/diffs/4.1.1.diff\` to also accept Comet's wrapper classes.
## Where
\`sql/core/src/test/scala/org/apache/spark/sql/DataFrameSetOperationsSuite.scala\`,
currently tagged
\`IgnoreComet("https://github.com/apache/datafusion-comet/issues/4098")\`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]