rdblue commented on code in PR #12781:
URL: https://github.com/apache/iceberg/pull/12781#discussion_r2043127255


##########
format/spec.md:
##########
@@ -786,9 +786,9 @@ Notes:
 
 #### First Row ID Assignment
 
-When adding a new data manifest file, its `first_row_id` field is assigned the 
value of the snapshot's `first_row_id` plus the sum of `added_rows_count` for 
all data manifests that preceded the manifest in the manifest list.
+When adding a new data manifest file, its `first_row_id` field is assigned the 
value of the snapshot's `first_row_id` plus the sum of `added_rows_count` and 
`existing_rows_count` for all data manifests that preceded the manifest in the 
manifest list and were assigned a `first_row_id`.

Review Comment:
   Updated, but this is a bit tricky because these are simply new manifests. 
Having a null `first_row_id` is an implementation detail that is not written 
into metadata. I've updated it to say both:
   
   > When adding a new data manifest file, its `first_row_id` field is assigned 
the value of the snapshot's `first_row_id` plus the sum of `added_rows_count` 
and `existing_rows_count` for all new data manifests that preceded it in the 
manifest list; that is, those that had a null `first_row_id` and were assigned 
one.
   
   I also reforamtted this secitn a little.



##########
format/spec.md:
##########
@@ -786,9 +786,9 @@ Notes:
 
 #### First Row ID Assignment
 
-When adding a new data manifest file, its `first_row_id` field is assigned the 
value of the snapshot's `first_row_id` plus the sum of `added_rows_count` for 
all data manifests that preceded the manifest in the manifest list.
+When adding a new data manifest file, its `first_row_id` field is assigned the 
value of the snapshot's `first_row_id` plus the sum of `added_rows_count` and 
`existing_rows_count` for all data manifests that preceded the manifest in the 
manifest list and were assigned a `first_row_id`.

Review Comment:
   Updated, but this is a bit tricky because these are simply new manifests. 
Having a null `first_row_id` is an implementation detail that is not written 
into metadata. I've updated it to say both:
   
   > When adding a new data manifest file, its `first_row_id` field is assigned 
the value of the snapshot's `first_row_id` plus the sum of `added_rows_count` 
and `existing_rows_count` for all new data manifests that preceded it in the 
manifest list; that is, those that had a null `first_row_id` and were assigned 
one.
   
   I also reformatted this section a little.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to