RussellSpitzer commented on code in PR #12781:
URL: https://github.com/apache/iceberg/pull/12781#discussion_r2040475105


##########
format/spec.md:
##########
@@ -689,9 +690,11 @@ When reading v1 manifests with no sequence number column, 
sequence numbers for a
 
 When adding a new data file, its `first_row_id` field is set to `null` because 
it is not assigned until the snapshot is successfully committed.
 
-When reading, the `first_row_id` is assigned by replacing `null` with the 
manifest's `first_row_id` plus the sum of `record_count` for all added data 
files that preceded the file in the manifest.
+When reading, the `first_row_id` is assigned by replacing `null` with the 
manifest's `first_row_id` plus the sum of `record_count` for all data files 
that preceded the file in the manifest that had a null `first_row_id`.
 
-The `first_row_id` is only inherited for added data files. The inherited value 
must be written into the data file metadata for existing and deleted entries. 
The value of `first_row_id` for delete files is always `null`.
+The inherited value of `first_row_id` must be written into the data file 
metadata when creating existing and deleted entries. The value of 
`first_row_id` for delete files is always `null`.
+
+In most cases, only added files will be assigned a new `first_row_id` via 
inheritance, but any unassigned `first_row_id` must be assigned to handle 
manifests in upgraded tables that have not yet assigned `first_row_id` for 
existing entries.

Review Comment:
   This sentence is a little difficult, I think there is a redundancy which is 
making a it a little confusing with "any unassigned ... that have not yet 
assigned"
   
   Maybe we can shorten this to.:
   ```suggestion
   Inheritance of `first_row_id` usually only applies to newly added files but 
during table format version upgrades existing files will also have a null value 
for `first_row_id` and must also be assigned via inheritance.
   ```
   ?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to