andygrove commented on PR #3977: URL: https://github.com/apache/datafusion-comet/pull/3977#issuecomment-4269945397
Honest update: the decision flip is real, but it is **not sufficient** to reproduce the full #3949 crash. I tried many DPP-shaped queries that match the general pattern (inner DPP join + outer broadcast join + aggregate/topK/intersect/IN-subquery, across BHJ/SMJ/coalesce/localRead variants). Every query executed cleanly — no `ColumnarToRowExec` canonicalization assertion. So the decision flip demonstrated in `CometDppFallbackConsistencySuite` is a real inconsistency, but something else — larger scale, specific stats, or a plan shape unique to q14a/q14b/q31/q47/q57 — is needed to actually crash. Where that leaves us: 1. `stageContainsDPPScan` descending into `QueryStageExec.plan` is likely still worth doing as a correctness fix, but I can no longer claim it will close #3949 until we have a real repro. 2. The fuzz + canonicalization infrastructure in this PR is still useful as regression coverage going forward. 3. To make progress, we likely need the actual plan from a failing awslabs run, or a diagnostic build that logs the plan at the moment `AdaptiveSparkPlanExec.createQueryStages` crashes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
