danielcweeks commented on code in PR #12580: URL: https://github.com/apache/iceberg/pull/12580#discussion_r2012492804
########## format/spec.md: ########## @@ -408,16 +408,17 @@ When `null`, a row's `_row_id` field is assigned to the `first_row_id` from its Values for `_row_id` and `_last_updated_sequence_number` are either read from the data file or assigned at read time. As a result on read, rows in a table always have non-null values for these fields when lineage is enabled. -When an existing row is moved to a different data file for any reason, writers are required to write `_row_id` and `_last_updated_sequence_number` according to the following rules: +When an existing row is moved to a different data file for any reason, writers should write `_row_id` and `_last_updated_sequence_number` according to the following rules: 1. The row's existing non-null `_row_id` must be copied into the new data file 2. If the write has modified the row, the `_last_updated_sequence_number` field must be set to `null` (so that the modification's sequence number replaces the current value) 3. If the write has not modified the row, the existing non-null `_last_updated_sequence_number` value must be copied to the new data file +The semantics of whether an operation affecting existing rows is modeled as deleting all modified rows and adding new rows, or upsert with preserved row ids is left to the implementing engine. Review Comment: > . . . can choose not follow the above rules . . . I really don't like using language like this because it implies that: 1. they're doing it wrong (when we don't really know) and 2. that it's ok to break the rules. The reason I use "semantics" in terms of the operation is that some operations are likely correct to use delete/add as the intent is a replace even though the same row id may be involved. At that point it's a semantic argument and I don't want to try to force engine behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org