andygrove opened a new issue, #4577:
URL: https://github.com/apache/datafusion-comet/issues/4577

   ### Background
   
   #4531 (closing #4526) fixed deep `And` / `Or` chains overflowing protobuf's 
default recursion limit (100) when the serialized plan is re-parsed, on the JVM 
via `OperatorOuterClass.Operator.parseFrom` (e.g. `findShuffleScanIndices` / 
explain) and in the Rust `prost` decoder. The fix flattens an associative chain 
(`QueryPlanSerde.flattenAssociative`) and rebuilds it as a balanced `O(log 
n)`-depth tree (`QueryPlanSerde.createBalancedBinaryExpr`), routing `CometAnd` 
/ `CometOr` through it.
   
   As @mbutrovich noted in 
[review](https://github.com/apache/datafusion-comet/pull/4531#pullrequestreview-4403399601),
 that fix only covers the boolean case. The protobuf recursion limit applies to 
any deeply nested `BinaryExpr`, so a long left-deep chain of other associative 
binary operators hits the same overflow.
   
   ### Proposal
   
   Extend the rebalancing introduced in #4531 to other associative binary 
expressions, for example `Add`, `Multiply`, and the bitwise operators 
(`BitwiseAnd`, `BitwiseOr`, `BitwiseXor`). The `flattenAssociative` / 
`createBalancedBinaryExpr` helpers are already generic over the wrap function 
and the match/children predicates, so wiring up additional operators should be 
mechanical.
   
   ### Considerations
   
   - Only rebalance operators that are genuinely associative in Comet's 
evaluation semantics. Integer `Add` / `Multiply` and the bitwise operators are 
associative. Floating-point `Add` / `Multiply` are not exactly associative, so 
reassociating them can change results; decide explicitly whether to rebalance 
those (and whether it matters given Spark itself reassociates) or restrict 
rebalancing to integral types.
   - Mirror the existing test approach from #4531: project a chain deeper than 
the recursion limit (e.g. 200) and assert Comet executes it natively with 
correct results.
   
   ### Relationship to #4531 / #4526
   
   Follow-up to the boolean-only fix in #4531.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to