kevinjqliu opened a new issue, #732:
URL: https://github.com/apache/iceberg-cpp/issues/732

   `StrictMetricsEvaluator::Evaluate` currently returns `ROWS_MUST_MATCH` when 
`data_file.record_count <= 0`:
   
   
https://github.com/apache/iceberg-cpp/blob/c0c6b01393070b0813f49b0d6220c98256379cef/src/iceberg/expression/strict_metrics_evaluator.cc#L534-L537
   
   This is correct for `record_count == 0`, but not for `record_count == -1`. 
iceberg-cpp uses `-1` as an unknown row-count sentinel when writer metrics do 
not include `row_count`:
   
   - data writer: 
https://github.com/apache/iceberg-cpp/blob/c0c6b01393070b0813f49b0d6220c98256379cef/src/iceberg/data/data_writer.cc#L80-L86
   - equality delete writer: 
https://github.com/apache/iceberg-cpp/blob/c0c6b01393070b0813f49b0d6220c98256379cef/src/iceberg/data/equality_delete_writer.cc#L80-L86
   - position delete writer: 
https://github.com/apache/iceberg-cpp/blob/c0c6b01393070b0813f49b0d6220c98256379cef/src/iceberg/data/position_delete_writer.cc#L140-L146
   
   Other code already treats negative record count as unknown/missing:
   
   - inclusive metrics: 
https://github.com/apache/iceberg-cpp/blob/c0c6b01393070b0813f49b0d6220c98256379cef/src/iceberg/expression/inclusive_metrics_evaluator.cc#L507-L516
   - count aggregate: 
https://github.com/apache/iceberg-cpp/blob/c0c6b01393070b0813f49b0d6220c98256379cef/src/iceberg/expression/aggregate.cc#L379-L388
   
   Impact: a file with unknown row count can be treated as if every row must 
match, even for predicates like `AlwaysFalse`.
   
   Suggested fix: only special-case `record_count == 0`; let negative/unknown 
counts fall through to normal strict metrics evaluation.
   
   A focused regression test with `record_count = -1` and `AlwaysFalse` fails 
before changing `<= 0` to `== 0`, and passes after.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to