malinjawi opened a new issue, #49885:
URL: https://github.com/apache/arrow/issues/49885

   ### Describe the enhancement requested
   
   This is follow-up work to GH-33985 / PR #34834 now that Substrait can 
represent unresolved / partially bound expressions (see 
substrait-io/substrait#515).
   
   Arrow can currently deserialize bound Substrait `ExtendedExpression` 
messages, but it cannot yet consume unresolved expressions that contain:
   - `Expression.NamedExpression`
   - `Type.Unknown`
   - unresolved function signatures such as `add:unknown_unknown`
   
   To support front-end filter / projection workflows, Arrow should be able to 
deserialize these messages using a supplied Arrow schema, bind unresolved names 
and types against that schema, and then return normal Arrow compute expressions.
   
   Concretely, this means:
   - binding `NamedExpression` to Arrow `FieldRef`
   - treating `Type.Unknown` as a bind-time placeholder instead of an 
executable Arrow type
   - allowing schema-aware deserialization of `ExtendedExpression`
   - exposing that path in both C++ and Python APIs
   
   The expected API shape is something like:
   - C++: `DeserializeExpressions(buf, input_schema, ...)`
   - Python:
     - `pyarrow.substrait.deserialize_expressions(buf, schema=...)`
     - `pyarrow.substrait.BoundExpressions.from_substrait(..., schema=...)`
     - `pyarrow.compute.Expression.from_substrait(..., schema=...)`
   
   This work depends on the Substrait protocol change in 
substrait-io/substrait#515.
   
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to