varun-lakhyani opened a new pull request, #15252:
URL: https://github.com/apache/iceberg/pull/15252
## Description
Implements missing column detection in `StrictMetricsEvaluator` using the
file's max field ID can be used mostly after schema evolution.
Resolves TODO comment in `StrictMetricsEvaluator.java:72`
## Changes
- Add `maxFieldId` to `DataFile` interface: tracks the max field id of
schema used to write the file
- Use `maxFieldId` in `StrictMetricsEvaluator` to detect missing columns:
- If current field id > max field id, then schema evolution has occurred,
so missing columns are null:
- `isNull` and `isNotNaN` return `ROWS_MUST_MATCH`
- All other operations return `ROWS_MIGHT_NOT_MATCH`
## How Has This Been Tested?
- Added unit tests covering all operations:
- `isNull` and `isNotNaN` return `ROWS_MUST_MATCH` for missing columns
- All other operations return `ROWS_MIGHT_NOT_MATCH` for missing columns
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]