andygrove opened a new pull request, #4166:
URL: https://github.com/apache/datafusion-comet/pull/4166

   ## Which issue does this PR close?
   
   Closes #.
   
   ## Rationale for this change
   
   Comet currently always converts a Spark `ShuffleExchangeExec` into a Comet 
columnar (JVM) shuffle, even when the shuffle's child is a plain Spark plan 
with no Comet operator beneath it. In that case the conversion only buys us 
Comet's shuffle implementation, while paying for a row to Arrow conversion at 
the shuffle boundary. There are workloads where it is preferable to leave such 
shuffles as regular Spark shuffles and reserve Comet shuffle for cases where 
the child is already a Comet plan.
   
   This PR introduces a config so users can opt out of that conversion without 
disabling Comet shuffle entirely.
   
   ## What changes are included in this PR?
   
   - New config `spark.comet.exec.shuffle.convertFromSparkPlan.enabled` 
(default `true`, preserves current behavior).
   - In `CometShuffleExchangeExec.shuffleSupported`, when the child is not a 
Comet plan and the new config is `false`, tag the node with an explain reason 
and return `None` so the shuffle stays as a native Spark `ShuffleExchangeExec`. 
The native shuffle path (which already requires a Comet child) is unaffected.
   
   ## How are these changes tested?
   
   Default behavior is unchanged so existing test coverage applies. Manual 
verification that the project compiles. Happy to add a targeted test if 
reviewers prefer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to