rdblue commented on code in PR #12781:
URL: https://github.com/apache/iceberg/pull/12781#discussion_r2043118532


##########
format/spec.md:
##########
@@ -689,9 +690,11 @@ When reading v1 manifests with no sequence number column, 
sequence numbers for a
 
 When adding a new data file, its `first_row_id` field is set to `null` because 
it is not assigned until the snapshot is successfully committed.
 
-When reading, the `first_row_id` is assigned by replacing `null` with the 
manifest's `first_row_id` plus the sum of `record_count` for all added data 
files that preceded the file in the manifest.
+When reading, the `first_row_id` is assigned by replacing `null` with the 
manifest's `first_row_id` plus the sum of `record_count` for all data files 
that preceded the file in the manifest that had a null `first_row_id`.
 
-The `first_row_id` is only inherited for added data files. The inherited value 
must be written into the data file metadata for existing and deleted entries. 
The value of `first_row_id` for delete files is always `null`.
+The inherited value of `first_row_id` must be written into the data file 
metadata when creating existing and deleted entries. The value of 
`first_row_id` for delete files is always `null`.
+
+In most cases, only added files will be assigned a new `first_row_id` via 
inheritance, but any unassigned `first_row_id` must be assigned to handle 
manifests in upgraded tables that have not yet assigned `first_row_id` for 
existing entries.

Review Comment:
   I updated this to:
   
   > Any null (unassigned) `first_row_id` must be assigned via inheritance, 
even if the data file is existing. This ensures that row IDs are assigned to 
existing data files in upgraded tables in the first commit after upgrading to 
v3.
   
   There's no need to talk about "usually" and make assumptions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to