andygrove opened a new issue, #4577: URL: https://github.com/apache/datafusion-comet/issues/4577
### Background #4531 (closing #4526) fixed deep `And` / `Or` chains overflowing protobuf's default recursion limit (100) when the serialized plan is re-parsed, on the JVM via `OperatorOuterClass.Operator.parseFrom` (e.g. `findShuffleScanIndices` / explain) and in the Rust `prost` decoder. The fix flattens an associative chain (`QueryPlanSerde.flattenAssociative`) and rebuilds it as a balanced `O(log n)`-depth tree (`QueryPlanSerde.createBalancedBinaryExpr`), routing `CometAnd` / `CometOr` through it. As @mbutrovich noted in [review](https://github.com/apache/datafusion-comet/pull/4531#pullrequestreview-4403399601), that fix only covers the boolean case. The protobuf recursion limit applies to any deeply nested `BinaryExpr`, so a long left-deep chain of other associative binary operators hits the same overflow. ### Proposal Extend the rebalancing introduced in #4531 to other associative binary expressions, for example `Add`, `Multiply`, and the bitwise operators (`BitwiseAnd`, `BitwiseOr`, `BitwiseXor`). The `flattenAssociative` / `createBalancedBinaryExpr` helpers are already generic over the wrap function and the match/children predicates, so wiring up additional operators should be mechanical. ### Considerations - Only rebalance operators that are genuinely associative in Comet's evaluation semantics. Integer `Add` / `Multiply` and the bitwise operators are associative. Floating-point `Add` / `Multiply` are not exactly associative, so reassociating them can change results; decide explicitly whether to rebalance those (and whether it matters given Spark itself reassociates) or restrict rebalancing to integral types. - Mirror the existing test approach from #4531: project a chain deeper than the recursion limit (e.g. 200) and assert Comet executes it natively with correct results. ### Relationship to #4531 / #4526 Follow-up to the boolean-only fix in #4531. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
