gstvg commented on PR #21679: URL: https://github.com/apache/datafusion/pull/21679#issuecomment-4323190771
@rluvaton @comphead @LiaCastaneda @pepijnve I pushed https://github.com/apache/datafusion/pull/21679/changes/284b5d2be826651f9863ba09d8a789b15f7a9cc9 to support flexible array_reduce signature, and pinged you all at the main entry point of the commit https://github.com/apache/datafusion/pull/21679#discussion_r3144259358 Also pulled breaking changes from lambda capture (https://github.com/apache/datafusion/pull/21679/commits/5b42e48a27ca38e545a02d0ac91ec692d365dc84) and making lambda variable field optional(https://github.com/apache/datafusion/pull/21679/commits/e83107e1d10fc42139c3ddccc9f66bd42a0ac1d3), and removed direct support for fixed size lists in array_transform and coerced them to normal lists instead (https://github.com/apache/datafusion/pull/21679/commits/7bc1f5509eb1c39c37582e27a46f74793befc811) Finally, I noted two minor and breaking improvements but didn't want to delay this even more but we can still do it: 1. Make logical and physical lambda able to store it's fields, and not only it's names: ```rust struct Lambda { pub params: LambdaParams, pub body: Box<Expr>, } pub enum LambdaParams { Bound(Vec<FieldRef>), // sql planner produces bound lambdas, a follow-on pr will add a helper to bound all lambdas of a expr tree and/or a logical plan Unbound(Vec<String>), } pub struct LambdaExpr { params: Vec<FieldRef>, body: Arc<dyn PhysicalExpr>, } ``` This would eliminate the following two calls to lambda_parameters(it would only be called during sql planning), as the lambda should already contain the fields(unbound lambdas raises an error during planning as unbound lambda variable): https://github.com/apache/datafusion/blob/c6324c4e7daf0162a320d744f6354b3a2f385f83/datafusion/physical-expr/src/higher_order_function.rs#L285-L293 https://github.com/apache/datafusion/blob/c6324c4e7daf0162a320d744f6354b3a2f385f83/datafusion/physical-expr/src/planner.rs#L408-L463 For function implementors this won't change anything, but would be easier to write a custom physical expr planner, for example. You can see how it would look like on this commit https://github.com/gstvg/arrow-datafusion/commit/3a8ad01ca3ed62d2aad3633131c0041e1b04f810 2. Disambiguate LambdaVariable name ```rust enum LambdaVariableField { Unbound(String), Bound(FieldRef), } struct LambdaVariable { pub field: LambdaVariableField, pub spans: Spans, } ``` This would remove the ambiguity between the name property and the FieldRef name -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
