gstvg commented on issue #21231: URL: https://github.com/apache/datafusion/issues/21231#issuecomment-4377092569
At #21323, during physical planning, lambda scope schemas are created by *appending* the lambda parameters to the outer schema, so that the outer schema fields still exists in the same position, regardless of being referenced or not (how it handle name shadowing I believe it's not relevant for this discussion). Given this query: ```sql create table t as select [[1, 2]] as b, 2 as b; select array_transform(a, arr -> array_transform(arr, v -> v + b)) from t; ``` The schemas would be: ```text [0 => a, 1 => b] [0 => a, 1 => b, 2 => arr] [0 => a, 1 => b, 2 => arr, 3 => v] ``` The planned query with indices, which would be what is visible via tree node traversals: ```sql select array_transform(a@0, arr@2 -> array_transform(arr@2, v@3 -> v@3 + b@1)) from t; ``` So every scoped schema is a superset of the root schema, and every column reference, regardless of being within a lambda scope, has an index that it's valid relative to the root schema (lambda variable references uses a different expression type) Then, as an internal optimization, not visible to the external world, similar to the case optimization, lambda derives a projected body to work with projected batches that doesn't include unrefereced columns or lambda variables: ```text [0 => a, 1 => b] [0 => b, 1 => arr] [0 => b, 1 => v] ``` ```sql select array_transform(a@0, arr@1 -> array_transform(arr@1, v@1 -> v@1 + b@0)) from t; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
