sentomk opened a new issue, #685:
URL: https://github.com/apache/iceberg-cpp/issues/685
**Summary**
`StrictMetricsEvaluator::CanContainNulls` and `CanContainNaNs` incorrectly
return `false` when the `null_value_counts` / `nan_value_counts` map is
non-empty but does not contain an entry for the queried field. This causes the
evaluator to erroneously return `kRowsMustMatch`, potentially skipping
row-level filtering and returning rows that do not satisfy the predicate.
**Root Cause**
In `src/iceberg/expression/strict_metrics_evaluator.cc`:
```cpp
bool CanContainNulls(int32_t id) {
if (data_file_.null_value_counts.empty()) {
return true;
}
auto it = data_file_.null_value_counts.find(id);
return it != data_file_.null_value_counts.cend() && it->second > 0;
// ^^^ when field is missing from map, this evaluates to false
}
```
The same pattern exists in CanContainNaNs.
**Reproduction**
```cpp
auto data_file = std::make_shared<DataFile>();
data_file->record_count = 50;
data_file->value_counts = {{14, 50L}};
data_file->null_value_counts = {{4, 0L}, {5, 0L}}; // field 14 missing
data_file->nan_value_counts = {{8, 0L}}; // field 14 missing
data_file->upper_bounds = {{14,
Literal::Double(100.0).Serialize().value()}};
data_file->lower_bounds = {{14, Literal::Double(1.0).Serialize().value()}};
// Evaluating: no_nan_stats < 200.0
// Expected: kRowsMightNotMatch (null count unknown)
// Actual: kRowsMustMatch (incorrectly skips filtering)
```
**Proposed Fix**
CanContainNulls: if the field is required per schema, return false; if the
field is not found in a non-empty map, return true (conservative).
CanContainNaNs: if the field type is not float/double, return false; if the
field is not found in a non-empty map, return true (conservative).
This aligns with Java's StrictMetricsEvaluator.canContainNulls() /
canContainNaNs() which return true when the field is missing from the map.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]