malinjawi opened a new issue, #49885:
URL: https://github.com/apache/arrow/issues/49885
### Describe the enhancement requested
This is follow-up work to GH-33985 / PR #34834 now that Substrait can
represent unresolved / partially bound expressions (see
substrait-io/substrait#515).
Arrow can currently deserialize bound Substrait `ExtendedExpression`
messages, but it cannot yet consume unresolved expressions that contain:
- `Expression.NamedExpression`
- `Type.Unknown`
- unresolved function signatures such as `add:unknown_unknown`
To support front-end filter / projection workflows, Arrow should be able to
deserialize these messages using a supplied Arrow schema, bind unresolved names
and types against that schema, and then return normal Arrow compute expressions.
Concretely, this means:
- binding `NamedExpression` to Arrow `FieldRef`
- treating `Type.Unknown` as a bind-time placeholder instead of an
executable Arrow type
- allowing schema-aware deserialization of `ExtendedExpression`
- exposing that path in both C++ and Python APIs
The expected API shape is something like:
- C++: `DeserializeExpressions(buf, input_schema, ...)`
- Python:
- `pyarrow.substrait.deserialize_expressions(buf, schema=...)`
- `pyarrow.substrait.BoundExpressions.from_substrait(..., schema=...)`
- `pyarrow.compute.Expression.from_substrait(..., schema=...)`
This work depends on the Substrait protocol change in
substrait-io/substrait#515.
### Component(s)
C++, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]