andygrove commented on PR #3977: URL: https://github.com/apache/datafusion-comet/pull/3977#issuecomment-4269903766
Update: @andygrove's initial-plan-vs-AQE inconsistency theory is confirmed. New commit adds `CometDppFallbackConsistencySuite`, which shows that `columnarShuffleSupported` (via `stageContainsDPPScan`) gives two different answers for the same shuffle: ``` initialDppVisible=true, initialDecision=false (correctly falls back — DPP scan visible) postAqeDppVisible=false, postAqeDecision=true (converts to Comet — DPP scan hidden) ``` ### Mechanism `stageContainsDPPScan` in `CometShuffleExchangeExec` walks `s.child.exists(...)` looking for a `FileSourceScanExec` with a `PlanExpression` partition filter. In AQE, once the inner stage materializes, its subtree is replaced by a `ShuffleQueryStageExec`, whose `children` is `Seq.empty`. `.exists` no longer descends through it, so the DPP scan becomes invisible and the #3879 fallback stops firing. The same shuffle flips from Spark to Comet on the second planning pass. The test demonstrates this by swapping the shuffle's child for an opaque `LeafExecNode` stub that matches the tree-walking behavior of a materialized stage. ### Suggested fix `stageContainsDPPScan` needs to descend into `QueryStageExec.plan` when walking for DPP filters. The same care is probably needed anywhere else we use `.exists` on a shuffle's child to reason about what's underneath. Once fixed, the test's `TODO(#3949)` assertions flip to the symmetric form (`initialDecision == postAqeDecision`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
