fallintoplace opened a new issue, #1103:
URL: https://github.com/apache/iceberg-go/issues/1103

   In #1098, reviewer feedback pointed out that `VisitNotEqual` and 
`VisitNotIn` still use `canContainNulls(fieldID) || canContainNans(fieldID)`.
   
   These branches have different semantics from the range/equality/IN visitors 
updated in #1098. Per Tanmay's review, the Java strict metrics evaluator uses 
`containsNullsOnly` / `containsNaNsOnly` for the negative predicate branches, 
then falls through to bounds checks. The Go implementation currently uses 
`canContainNulls` / `canContainNans`, which treats "any nulls/NaNs may exist" 
as enough to prove all rows match `!=` / `NOT IN`.
   
   That seems too optimistic: a file with some nulls and some values equal to 
the literal should not be considered guaranteed to match the negative 
predicate. Only an all-null/all-NaN column can be proven from that metric alone.
   
   Proposed fix:
   
   - update `VisitNotEqual` and `VisitNotIn` to use `containsNullsOnly` / 
`containsNaNsOnly`
   - add regression coverage for mixed null/value files where `canContainNulls` 
would be too broad
   - keep this separate from #1098 because the negative predicate semantics 
differ from the missing-metric conservative handling there
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to