rdblue commented on code in PR #12781: URL: https://github.com/apache/iceberg/pull/12781#discussion_r2043118532
########## format/spec.md: ########## @@ -689,9 +690,11 @@ When reading v1 manifests with no sequence number column, sequence numbers for a When adding a new data file, its `first_row_id` field is set to `null` because it is not assigned until the snapshot is successfully committed. -When reading, the `first_row_id` is assigned by replacing `null` with the manifest's `first_row_id` plus the sum of `record_count` for all added data files that preceded the file in the manifest. +When reading, the `first_row_id` is assigned by replacing `null` with the manifest's `first_row_id` plus the sum of `record_count` for all data files that preceded the file in the manifest that had a null `first_row_id`. -The `first_row_id` is only inherited for added data files. The inherited value must be written into the data file metadata for existing and deleted entries. The value of `first_row_id` for delete files is always `null`. +The inherited value of `first_row_id` must be written into the data file metadata when creating existing and deleted entries. The value of `first_row_id` for delete files is always `null`. + +In most cases, only added files will be assigned a new `first_row_id` via inheritance, but any unassigned `first_row_id` must be assigned to handle manifests in upgraded tables that have not yet assigned `first_row_id` for existing entries. Review Comment: I updated this to: > Any null (unassigned) `first_row_id` must be assigned via inheritance, even if the data file is existing. This ensures that row IDs are assigned to existing data files in upgraded tables in the first commit after upgrading to v3. There's no need to talk about "usually" and make assumptions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org