andygrove opened a new pull request, #3879:
URL: https://github.com/apache/datafusion-comet/pull/3879
**[EXPERIMENTAL]**
## Which issue does this PR close?
Closes #3874.
## Rationale for this change
When a scan uses Dynamic Partition Pruning (DPP) and falls back to Spark,
Comet was still wrapping the stage with columnar shuffle, creating an
inefficient plan with multiple row-to-columnar transitions:
```
CometShuffleWriter
CometRowToColumnar
SparkFilter
SparkColumnarToRow
SparkScan
```
This was causing orders of magnitude slowdowns in TPC-DS queries that use
DPP on fact table scans.
## What changes are included in this PR?
Adds a DPP check in `columnarShuffleSupported()` in
`CometShuffleExchangeExec`. When `spark.comet.dppFallback.enabled=true` (the
default), the method now walks the child plan tree to detect
`FileSourceScanExec` nodes with dynamic pruning filters. If found, it returns
`false`, preventing the shuffle exchange from being converted to Comet and
allowing the entire stage to fall back to Spark.
## How are these changes tested?
New test `DPP fallback avoids inefficient Comet shuffle (#3874)` that forces
a sort-merge join with DPP and verifies no `CometColumnarShuffle` appears in
the plan. Existing `DPP fallback` test continues to pass.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]