karuppayya commented on code in PR #4010:
URL: https://github.com/apache/datafusion-comet/pull/4010#discussion_r3119024954


##########
spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala:
##########
@@ -97,6 +97,32 @@ case class CometExecRule(session: SparkSession) extends 
Rule[SparkPlan] {
 
   private lazy val showTransformations = 
CometConf.COMET_EXPLAIN_TRANSFORMATIONS.get()
 
+  /**
+   * Revert any `CometShuffleExchangeExec` with `CometColumnarShuffle` that is 
sandwiched between
+   * two non-Comet operators back to the original Spark `ShuffleExchangeExec`. 
Columnar shuffle
+   * converts row-based input to Arrow batches for the shuffle read side; if 
neither the parent
+   * nor the child is a Comet plan that can consume columnar output, that 
conversion is pure
+   * overhead (row->arrow->shuffle->arrow->row vs. row->shuffle->row).
+   */
+  private def revertRedundantColumnarShuffle(plan: SparkPlan): SparkPlan = {

Review Comment:
   I agree that this optimization will improve performance and compute 
efficiency.
   
   My main concern is determining the best recommendation for users to tune 
memory, particularly since they cannot explicitly disable it. 
   
   Also can it be a seperate rule in itself and have it only in 
`org.apache.comet.CometSparkSessionExtensions.CometExecColumnar#postColumnarTransitions`?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to