andygrove opened a new pull request, #1586:
URL: https://github.com/apache/datafusion-ballista/pull/1586
## Summary
Fixes TPC-H Q2 (and any query using subqueries) failing in the Python client
with:
```
Proto serialization error: Expr::ScalarSubquery(_) | Expr::InSubquery(_) |
Expr::Exists { .. } | Expr::OuterReferenceColumn not supported
```
**Root cause:** `_to_internal_df()` was calling `logical_plan().to_proto()`,
which serializes the *unoptimized* plan. Subquery expressions
(`ScalarSubquery`, `InSubquery`, `Exists`, `OuterReferenceColumn`) are not
supported by DataFusion's proto serializer, and they're still present in the
unoptimized plan.
The Rust client is unaffected because DataFusion's query optimizer runs
*before* `BallistaQueryPlanner::create_physical_plan()` is called —
`DecorrelatePredicateSubquery` converts subqueries into joins first, so the
plan handed to the serializer contains no subquery nodes.
**Fix:** one line — use `optimized_logical_plan().to_proto()` so the Python
client serializes the same optimizer output that the Rust path sends to the
scheduler.
Fixes #1581
## Test plan
- [ ] TPC-H Q2 runs successfully via Python client
- [ ] All 22 TPC-H queries pass via Python client
- [ ] Existing Python tests pass
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]