pepijnve commented on issue #21231:
URL: https://github.com/apache/datafusion/issues/21231#issuecomment-4343526798

   > Thus I think the "right" way to fix this bug (and avoid potential future 
bugs) is:
   > 1. Add an API to PhysicalExpr to report schema column used
   
   @alamb I agree with you that we should try to make the aspect of 'which 
columns do you need' and 'project yourself with this column mapping' part of 
the `PhysicalExpr` trait so that each implementation can handle this however it 
sees fit. That allows us to remove the downcasts from `Case` and wherever else 
similar logic exists.
   
   The part that I'm still not confident about is how we would control 
recursion. Would it be preferable for the 'collect columns' trait method to 
return the used columns for the physical expression itself and all its children 
(in other words, the method implementation itself handles the recursion) or 
should the recursion be done externally using a visitor?
   
   The tradeoff I see is that internal recursion potentially requires a lot of 
data copying. If the method returns a `Vec` for instance you have to merge Vecs 
at each recursion level, filter out duplicates, etc.
   
   With external recursion though we can avoid that and collect into a set, but 
I think we may need to use something like `Result<(Option<usize>, 
TreeNodeRecursion)>` as return type so that implementations can return an 
optional column index and also have scope control.
   
   A possible hybrid is internal recursion, but rather than returning values 
via the return value, we do so by passing in a consumer callback function. That 
avoids having to have `TreeNodeRecursion` in the method signature, and also 
avoids the need for recursive result merging, but then we probably have to have 
a dyn function parameter.
   
   Is there any prior art in the library for something like this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to