rdblue commented on code in PR #12781:
URL: https://github.com/apache/iceberg/pull/12781#discussion_r2047884134


##########
format/spec.md:
##########
@@ -786,9 +790,11 @@ Notes:
 
 #### First Row ID Assignment
 
-When adding a new data manifest file, its `first_row_id` field is assigned the 
value of the snapshot's `first_row_id` plus the sum of `added_rows_count` for 
all data manifests that preceded the manifest in the manifest list.
+The `first_row_id` for existing manifests must be preserved when writing a new 
manifest list. The value of `first_row_id` for delete manifests is always 
`null`. The `first_row_id` is only assigned for data manifests that do not have 
a `first_row_id`. Assignment must account for data files that will be assigned 
`first_row_id` values when the manifest is read.
 
-The `first_row_id` is only assigned for new data manifests. Values for 
existing manifests must be preserved when writing a new manifest list. The 
value of `first_row_id` for delete manifests is always `null`.
+The first manifest without a `first_row_id` is assigned a value that is 
greater than or equal to the `first_row_id` of the snapshot. Subsequent 
manifests without a `first_row_id` are assigned one based on the previous 
manifest to be assigned a `first_row_id`. Each assigned `first_row_id` must 
increase by the row count of all files that will be assigned a `first_row_id` 
via inheritance in the last assigned manifest. That is, each `first_row_id` 
must be greater than or equal to the last assigned `first_row_id` plus the 
total record count of data files with a null `first_row_id` in the last 
assigned manifest.

Review Comment:
   The second sentence is intended to clarify how to interpret the requirement 
(the "must be") by stating that the `first_row_id` that is assigned is based on 
the last manifest to be assigned a `first_row_id` -- without saying _how_ it is 
"based on" that manifest. My intent was to make the "how" sentence easier to 
understand, rather than stating the same complicated thing twice.
   
   The complication I'm trying to avoid is the distinction between the manifest 
that precedes the one being assigned a `first_row_id` and the last manifest to 
be assigned a `first_row_id`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to