rdblue commented on code in PR #12781: URL: https://github.com/apache/iceberg/pull/12781#discussion_r2047884134
########## format/spec.md: ########## @@ -786,9 +790,11 @@ Notes: #### First Row ID Assignment -When adding a new data manifest file, its `first_row_id` field is assigned the value of the snapshot's `first_row_id` plus the sum of `added_rows_count` for all data manifests that preceded the manifest in the manifest list. +The `first_row_id` for existing manifests must be preserved when writing a new manifest list. The value of `first_row_id` for delete manifests is always `null`. The `first_row_id` is only assigned for data manifests that do not have a `first_row_id`. Assignment must account for data files that will be assigned `first_row_id` values when the manifest is read. -The `first_row_id` is only assigned for new data manifests. Values for existing manifests must be preserved when writing a new manifest list. The value of `first_row_id` for delete manifests is always `null`. +The first manifest without a `first_row_id` is assigned a value that is greater than or equal to the `first_row_id` of the snapshot. Subsequent manifests without a `first_row_id` are assigned one based on the previous manifest to be assigned a `first_row_id`. Each assigned `first_row_id` must increase by the row count of all files that will be assigned a `first_row_id` via inheritance in the last assigned manifest. That is, each `first_row_id` must be greater than or equal to the last assigned `first_row_id` plus the total record count of data files with a null `first_row_id` in the last assigned manifest. Review Comment: The second sentence is intended to clarify how to interpret the requirement (the "must be") by stating that the `first_row_id` that is assigned is based on the last manifest to be assigned a `first_row_id` -- without saying _how_ it is "based on" that manifest. My intent was to make the "how" sentence easier to understand, rather than stating the same complicated thing twice. The complication I'm trying to avoid is the distinction between the manifest that precedes the one being assigned a `first_row_id` and the last manifest to be assigned a `first_row_id`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org