alamb commented on code in PR #21651:
URL: https://github.com/apache/datafusion/pull/21651#discussion_r3148377750
##########
datafusion/physical-expr/src/projection.rs:
##########
@@ -726,6 +729,100 @@ impl ProjectionExprs {
}
}
+/// Propagate min/max statistics through an expression using
+/// [`PhysicalExpr::evaluate_bounds`]. Works for any expression that
+/// implements `evaluate_bounds` (CAST, negation, arithmetic with literals,
etc.).
+///
+/// Only applied when the expression references a single column at most once in
Review Comment:
I double checked the `PhysicalExpr::evaluate_bounds` docs:
https://github.com/apache/datafusion/blob/dc6142ea4927e0853cf3a671e7055c41a1cad659/datafusion/physical-expr-common/src/physical_expr.rs#L170-L190
And they are not explicit about if the bounds have to actually be exact or
if they can be an "envelope"
I will make a PR to update the documentation explaining they are an envelope
(though it may make sense over time to make sure the bounds are actually exact)
##########
datafusion/physical-expr/src/projection.rs:
##########
@@ -726,6 +729,100 @@ impl ProjectionExprs {
}
}
+/// Propagate min/max statistics through an expression using
+/// [`PhysicalExpr::evaluate_bounds`]. Works for any expression that
+/// implements `evaluate_bounds` (CAST, negation, arithmetic with literals,
etc.).
+///
+/// Only applied when the expression references a single column at most once in
Review Comment:
This is better but I still think this is overly broad. For example, I think
`MIN(sin(x))` will now also get the wrong answer because it is hard coded to
always return `[-1, 1]` as its bounds regardless of its input, for example:
https://github.com/apache/datafusion/blob/5901df58b21b8b4e36011744e7ddc17bcb6a37b3/datafusion/functions/src/math/bounds.rs#L27-L32
So that means a query like
```sql
SELECT MIN(sin(x)), MAX(sin(x)) ...
```
Will end up returning `-1`, `1` even when there are no values in the input
that result in those values
Here is a reproducer:
- https://github.com/Dandandan/arrow-datafusion/pull/361
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]