RussellSpitzer commented on code in PR #12982: URL: https://github.com/apache/iceberg/pull/12982#discussion_r2076321787
########## format/spec.md: ########## @@ -1680,6 +1680,25 @@ Row-level delete changes: * These position delete files must be merged into the DV for a data file when one is created * Position delete files that contain deletes for more than one data file need to be kept in table metadata until all deletes are replaced by DVs +Row lineage changes: + +* Writers must set the table's `next-row-id` and use the existing `next-row-id` as the `first-row-id` to create a new snapshot + * When a table is upgraded to v3, `next_row_id` should be initialized to 0 + * When committing a new snapshot `next-row-id` must be incremented by at least the number of newly assigned row ids in the snapshot + * It is recommended to increment `next-row-id` by the total `added_rows_count` and `existing_rows_count` of all manifests assigned a `first_row_id` +* Writers must assign a `first_row_id` to new data manifests when writing a manifest list + * When writing a new manifest list each `first_row_id` must be incremented by at least the number of newly assigned row ids in each manifest + * It is recommended to increment `first_row_id` by a manifest's `added_rows_count` and `existing_rows_count` +* Readers must assign a `first_row_id` for any data file that does not have an assigned value written in a manifest + * Readers must increment `first_row_id` by the data file's `record_count` +* When writing an existing data file into a new manifest, its `first_row_id` must be written into the manifest +* When a data file has a non-null `first_row_id`, readers must: + * Replace any null or missing `_row_id` with the data file's `first_row_id` plus the row's `_pos` + * Replace any null or missing `_last_updated_sequence_number` to the data file's `data_sequence_number` + * Read any non-null `_row_id` or `_last_updated_sequence_number` without modification +* When a data file has a null `first_row_id`, readers must produce null for `_row_id` and `_last_updated_sequence_number` Review Comment: I think maybe for ``` For backwards compatibility, any snapshot with a null or missing `first_row_id` must produce null for `_row_id` and `_last_updated_sequence_number` for all rows in that snapshot. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org