kevinjqliu opened a new issue, #3498:
URL: https://github.com/apache/iceberg-python/issues/3498

   Several visitor/evaluator edge cases appear unsafe or inconsistent:
   
   1. `_StrictMetricsEvaluator.visit_not_equal` / `visit_not_in` return 
`ROWS_MUST_MATCH` when a file can contain nulls or NaNs. Example stats with 
`[null, 5]` or `[NaN, 5.0]` and lower/upper bounds both `5` return true for 
`NotEqualTo("x", 5)` / `NotIn("x", {5})`, even though one row does not match. 
This can incorrectly mark whole files deleted.
   
   2. `_StrictMetricsEvaluator.eval` returns `ROWS_MUST_MATCH` for 
`record_count <= 0`. `record_count=0` is vacuously true, but `record_count=-1` 
is unknown per the local comment; even `AlwaysFalse()` returns true.
   
   3. `ResidualVisitor` comparison methods directly compare partition values to 
literals. A nullable identity partition value of `None` with `LessThan("x", 1)` 
raises `TypeError`, while row evaluation returns false.
   
   4. `ResidualVisitor.visit_not_nan(None)` returns `AlwaysFalse`, while 
expression evaluation treats `NotNaN(None)` as true. Existing tests encode both 
behaviors, so the semantics are inconsistent.
   
   Validated against the current tree; examples use stats/partition shapes 
already supported by the repo tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to