andygrove commented on PR #3977:
URL: 
https://github.com/apache/datafusion-comet/pull/3977#issuecomment-4269903766

   Update: @andygrove's initial-plan-vs-AQE inconsistency theory is confirmed.
   
   New commit adds `CometDppFallbackConsistencySuite`, which shows that 
`columnarShuffleSupported` (via `stageContainsDPPScan`) gives two different 
answers for the same shuffle:
   
   ```
   initialDppVisible=true,  initialDecision=false   (correctly falls back — DPP 
scan visible)
   postAqeDppVisible=false, postAqeDecision=true    (converts to Comet — DPP 
scan hidden)
   ```
   
   ### Mechanism
   
   `stageContainsDPPScan` in `CometShuffleExchangeExec` walks 
`s.child.exists(...)` looking for a `FileSourceScanExec` with a 
`PlanExpression` partition filter. In AQE, once the inner stage materializes, 
its subtree is replaced by a `ShuffleQueryStageExec`, whose `children` is 
`Seq.empty`. `.exists` no longer descends through it, so the DPP scan becomes 
invisible and the #3879 fallback stops firing. The same shuffle flips from 
Spark to Comet on the second planning pass.
   
   The test demonstrates this by swapping the shuffle's child for an opaque 
`LeafExecNode` stub that matches the tree-walking behavior of a materialized 
stage.
   
   ### Suggested fix
   
   `stageContainsDPPScan` needs to descend into `QueryStageExec.plan` when 
walking for DPP filters. The same care is probably needed anywhere else we use 
`.exists` on a shuffle's child to reason about what's underneath.
   
   Once fixed, the test's `TODO(#3949)` assertions flip to the symmetric form 
(`initialDecision == postAqeDecision`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to