andygrove opened a new pull request, #4195:
URL: https://github.com/apache/datafusion-comet/pull/4195

   ## Which issue does this PR close?
   
   Closes #4191 (sub-issue of #4098).
   
   ## Rationale for this change
   
   Three Spark 4.1 tests in \`DataFrameSetOperationsSuite\` were ignored under 
Comet:
   
   - \`SPARK-52921: union partitioning - reused shuffle\`
   - \`SPARK-52921: union partitioning - semantic equality\`
   - \`SPARK-52921: union partitioning - range partitioning\`
   
   The tests inspect the executed plan with strict pattern matches:
   
   \`\`\`scala
   case u: UnionExec => u
   case s: ShuffleExchangeExec => s
   \`\`\`
   
   Under Comet, \`UnionExec\` is replaced by \`CometUnionExec\` (extends 
\`CometExec\`, not \`UnionExec\`) and \`ShuffleExchangeExec\` is replaced by 
\`CometShuffleExchangeExec\` (extends \`ShuffleExchangeLike\`, the trait both 
implementations share). The collectors found zero operators, the \`size == 1\` 
assertions failed, and the IgnoreComet was added pointing at the umbrella 
tracking issue #4098.
   
   ## What changes are included in this PR?
   
   Patch the matchers in \`dev/diffs/4.1.1.diff\` so the tests recognize 
Comet's wrappers:
   
   - \`case s: ShuffleExchangeExec\` → \`case s: ShuffleExchangeLike\` (one 
trait, matches both impls).
   - \`case u: UnionExec\` → also match \`case u: CometUnionExec\` (no shared 
parent, so two cases).
   
   Both are valid for vanilla Spark too: \`ShuffleExchangeExec\` extends 
\`ShuffleExchangeLike\`, and the additional \`CometUnionExec\` case is simply 
unreachable when Comet is disabled.
   
   ## How are these changes tested?
   
   The fix is test-side only (no production code change). The partitioning 
equality assertions still hold under Comet because:
   
   - \`CometShuffleExchangeExec.apply\` (in \`ShimCometShuffleExchangeExec\`) 
sets \`outputPartitioning = wrapped.outputPartitioning\`, preserving the 
original shuffle's partitioning.
   - \`CometUnionExec.outputPartitioning\` delegates to 
\`originalPlan.outputPartitioning\` (the wrapped \`UnionExec\`), which honors 
\`UNION_OUTPUT_PARTITIONING\` and computes against its original Spark children 
— so the SPARK-52921 semantics are preserved end-to-end.
   
   The Spark SQL CI workflow will exercise the un-ignored tests on Spark 4.1.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to