varun-lakhyani opened a new pull request, #15252:
URL: https://github.com/apache/iceberg/pull/15252

   ## Description
   Implements missing column detection in `StrictMetricsEvaluator` using the 
file's max field ID can be used mostly after schema evolution.
   
   Resolves TODO comment in `StrictMetricsEvaluator.java:72`
   
   ## Changes
   - Add `maxFieldId` to `DataFile` interface: tracks the max field id of 
schema used to write the file
   - Use `maxFieldId` in `StrictMetricsEvaluator` to detect missing columns:
     - If current field id > max field id, then schema evolution has occurred, 
so missing columns are null:
       - `isNull` and `isNotNaN` return `ROWS_MUST_MATCH`
       - All other operations return `ROWS_MIGHT_NOT_MATCH`
   
   ## How Has This Been Tested?
   - Added unit tests covering all operations:
     - `isNull` and `isNotNaN` return `ROWS_MUST_MATCH` for missing columns
     - All other operations return `ROWS_MIGHT_NOT_MATCH` for missing columns


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to