rdblue opened a new pull request, #12311:
URL: https://github.com/apache/iceberg/pull/12311

   This updates `InclusiveMetricsEvaluator` that uses column stats to skip data 
files during scan planning.
   
   The evaluator was implementing the older `BoundExpressionVisitor` interface 
that only supported `BoundReference` and not other `BoundTerm` instances like 
`BoundTransform`. After #12304, `BoundExtract` also needs to be supported.
   
   Filtering works for transformed values when the transform is order 
preserving. If it is not order preserving (like `bucket`) the bounds cannot be 
used.
   
   Most of the changes to `BoundExpressionVisitor` are to produce the lower and 
upper bounds values that are tested:
   * For `BoundReference`, deserialize the bound from the correct `lowerBounds` 
or `upperBounds` map (moved to `parseLowerBound` and `parseLowerBound`
   * For `BoundTransform`, deserialize the bound and transform it if the 
transform is order-preserving (in `transformLowerBound` and 
`transformUpperBound`)
   * For `Extract`, deserialize the bound as a map from field name to 
`VariantValue`, then convert the value to the internal representation (in 
`extractLowerBound` and `extractUpperBound`)
   
   Note that this implementation **temporarily** uses Java serialization to 
convert the Variant field bounds to bytes. Once we have defined a serialization 
for these values, this should use it instead. Serialization and parsing is 
handled by `VariantDataUtil`.
   
   This adds new test suites for the new `BoundTerm` cases that are supported:
   * `TestInclusiveMetricsEvaluatorWithExtract` tests variant cases
   * `TestInclusiveMetricsEvaluatorWithTransforms` tests transform cases


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to