Nathan-Fenner opened a new issue, #1354:
URL: https://github.com/apache/iceberg-rust/issues/1354

   ### Apache Iceberg Rust version
   
   None
   
   ### Describe the bug
   
   The current implementation for partition filtering treats a missing 
`lower_bound`/`upper_bound` value as though all rows are null:
   
   
   Current 
[manifest_evaluator.rs](https://github.com/apache/iceberg-rust/blob/50de31a5ef7518aeefad88ed9d815cd24ca962b8/crates/iceberg/src/expr/visitors/manifest_evaluator.rs#L156-L161):
   ```rs
       fn less_than(
           &mut self,
           reference: &BoundReference,
           datum: &Datum,
           _predicate: &BoundPredicate,
       ) -> crate::Result<bool> {
           let field = self.field_summary_for_reference(reference);
           match &field.lower_bound {
               Some(bound) if datum <= bound => ROWS_CANNOT_MATCH,
               Some(_) => ROWS_MIGHT_MATCH,
               None => ROWS_CANNOT_MATCH,
           }
       }
   ```
   
   This means that if statistics were not computed on a given partition file, 
that file will be excluded no matter what.
   
   For comparison, the [Java 
implementation](https://github.com/apache/iceberg/blob/a7f3dc79a2f42a4875ac35eec2137ecff15204fc/api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java#L210-L213)
 handles this correctly:
   
   ```java
         T lower = lowerBound(term);
         if (null == lower || NaNUtil.isNaN(lower)) {
           // NaN indicates unreliable bounds. See the 
InclusiveMetricsEvaluator docs for more.
           return ROWS_MIGHT_MATCH;
         }
   ```
   
   by treating a `null` lower bound as indicating that all rows might match.
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   A partition file with a missing `lower_bound` column should _not be 
excluded_ (should be included) from scans that filter on that column with 
`<`/`<=`/`>`/`>=`.
   
   ### Willingness to contribute
   
   I can contribute a fix for this bug independently


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to