Re: [PR] perf: avoid JVM shuffle when sandwiched between non-Comet operators [datafusion-comet]

via GitHub Thu, 23 Apr 2026 07:58:57 -0700


andygrove commented on code in PR #4010:
URL: https://github.com/apache/datafusion-comet/pull/4010#discussion_r3131711886



##########
spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala:
##########
@@ -97,6 +97,32 @@ case class CometExecRule(session: SparkSession) extends 
Rule[SparkPlan] {
 
   private lazy val showTransformations = 
CometConf.COMET_EXPLAIN_TRANSFORMATIONS.get()
 
+  /**
+   * Revert any `CometShuffleExchangeExec` with `CometColumnarShuffle` that is 
sandwiched between
+   * two non-Comet operators back to the original Spark `ShuffleExchangeExec`. 
Columnar shuffle
+   * converts row-based input to Arrow batches for the shuffle read side; if 
neither the parent
+   * nor the child is a Comet plan that can consume columnar output, that 
conversion is pure
+   * overhead (row->arrow->shuffle->arrow->row vs. row->shuffle->row).
+   */
+  private def revertRedundantColumnarShuffle(plan: SparkPlan): SparkPlan = {

Review Comment:
   @karuppayya I added a section to the tuning guide and also added a config to 
enable/disable this optimization. Could you take another look?
   
   I'm open to refactoring to create different rules, but would prefer to wait 
for some current DPP work to finish first, and also some work for fixing 
planning issues with mixed partial/final aggregates.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: avoid JVM shuffle when sandwiched between non-Comet operators [datafusion-comet]

Reply via email to