fallintoplace opened a new issue, #1103: URL: https://github.com/apache/iceberg-go/issues/1103
In #1098, reviewer feedback pointed out that `VisitNotEqual` and `VisitNotIn` still use `canContainNulls(fieldID) || canContainNans(fieldID)`. These branches have different semantics from the range/equality/IN visitors updated in #1098. Per Tanmay's review, the Java strict metrics evaluator uses `containsNullsOnly` / `containsNaNsOnly` for the negative predicate branches, then falls through to bounds checks. The Go implementation currently uses `canContainNulls` / `canContainNans`, which treats "any nulls/NaNs may exist" as enough to prove all rows match `!=` / `NOT IN`. That seems too optimistic: a file with some nulls and some values equal to the literal should not be considered guaranteed to match the negative predicate. Only an all-null/all-NaN column can be proven from that metric alone. Proposed fix: - update `VisitNotEqual` and `VisitNotIn` to use `containsNullsOnly` / `containsNaNsOnly` - add regression coverage for mixed null/value files where `canContainNulls` would be too broad - keep this separate from #1098 because the negative predicate semantics differ from the missing-metric conservative handling there -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
