andygrove commented on code in PR #4010:
URL: https://github.com/apache/datafusion-comet/pull/4010#discussion_r3131711886
##########
spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala:
##########
@@ -97,6 +97,32 @@ case class CometExecRule(session: SparkSession) extends
Rule[SparkPlan] {
private lazy val showTransformations =
CometConf.COMET_EXPLAIN_TRANSFORMATIONS.get()
+ /**
+ * Revert any `CometShuffleExchangeExec` with `CometColumnarShuffle` that is
sandwiched between
+ * two non-Comet operators back to the original Spark `ShuffleExchangeExec`.
Columnar shuffle
+ * converts row-based input to Arrow batches for the shuffle read side; if
neither the parent
+ * nor the child is a Comet plan that can consume columnar output, that
conversion is pure
+ * overhead (row->arrow->shuffle->arrow->row vs. row->shuffle->row).
+ */
+ private def revertRedundantColumnarShuffle(plan: SparkPlan): SparkPlan = {
Review Comment:
@karuppayya I added a section to the tuning guide and also added a config to
enable/disable this optimization. Could you take another look?
I'm open to refactoring to create different rules, but would prefer to wait
for some current DPP work to finish first, and also some work for fixing
planning issues with mixed partial/final aggregates.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]