gstvg commented on code in PR #21323:
URL: https://github.com/apache/datafusion/pull/21323#discussion_r3043329706
##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -133,6 +133,10 @@ impl CaseBody {
expr.apply(|expr| {
if let Some(column) = expr.as_any().downcast_ref::<Column>() {
used_column_indices.insert(column.index());
+ } else if let Some(lambda_variable) =
+ expr.as_any().downcast_ref::<LambdaVariable>()
+ {
+ used_column_indices.insert(lambda_variable.index());
Review Comment:
I added a sqllogictest test which I hope includes all the cases you cited
and more
(https://github.com/apache/datafusion/pull/21323/changes/4932cae74e4338bc6e148887cd862f7ab5f4d43c).
Compared to your snippet at
https://github.com/apache/datafusion/issues/21231#issuecomment-4164883881 where
lambda variables are included first in the scoped schema and external columns
after them, here lambda variables are pushed to the end of the outer schema,
which still includes unreferenced columns, and in case of any name conflicts(a
lambda variable shadows a field from the outer schema), we rename the shadowed
field to an unique name (
https://github.com/apache/datafusion/pull/21323/changes/5c5ca195d1fc2c708a1e4964e8392337fc3b0b02#diff-a3e127629e9516ec496d656ebb53a1e8bf730eb02d219c4ce42ee47572685844R253-R325,
https://github.com/apache/datafusion/pull/21323/changes/5c5ca195d1fc2c708a1e4964e8392337fc3b0b02#diff-7fb0a64e734f54d94d48e9e02c51573a3678205f9ee8e2afaf41d686187a285eR440-R489).
That way,
after a field has been introduced into the schema, be it a column on the
outermost schema or a lambda variable into inner schemas, their index never
changes, regardless of how many new scopes are created from it down the tree.
Because of that, the casewhen optimization (as well as the same opimization in
lambdas) can safely collect all indices and assume all those that are
out-of-bounds of the scoped batch it's projecting refer to inner lambda
variables not yet available. It still need to rewrite all of them since they
were originally computed based on the unprojected, full schema, and any
projection of a outer schema affects the indices of all it's derived, inner
schemas, and must be propagated down the tree, for every projection(inner
projections couldn't know how to rewrite indices of outer projection)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]