pepijnve commented on issue #21231: URL: https://github.com/apache/datafusion/issues/21231#issuecomment-4343526798
> Thus I think the "right" way to fix this bug (and avoid potential future bugs) is: > 1. Add an API to PhysicalExpr to report schema column used @alamb I agree with you that we should try to make the aspect of 'which columns do you need' and 'project yourself with this column mapping' part of the `PhysicalExpr` trait so that each implementation can handle this however it sees fit. That allows us to remove the downcasts from `Case` and wherever else similar logic exists. The part that I'm still not confident about is how we would control recursion. Would it be preferable for the 'collect columns' trait method to return the used columns for the physical expression itself and all its children (in other words, the method implementation itself handles the recursion) or should the recursion be done externally using a visitor? The tradeoff I see is that internal recursion potentially requires a lot of data copying. If the method returns a `Vec` for instance you have to merge Vecs at each recursion level, filter out duplicates, etc. With external recursion though we can avoid that and collect into a set, but I think we may need to use something like `Result<(Option<usize>, TreeNodeRecursion)>` as return type so that implementations can return an optional column index and also have scope control. A possible hybrid is internal recursion, but rather than returning values via the return value, we do so by passing in a consumer callback function. That avoids having to have `TreeNodeRecursion` in the method signature, and also avoids the need for recursive result merging, but then we probably have to have a dyn function parameter. Is there any prior art in the library for something like this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
