rdblue commented on code in PR #11130:
URL: https://github.com/apache/iceberg/pull/11130#discussion_r1791001221


##########
format/spec.md:
##########
@@ -298,16 +298,102 @@ Iceberg tables must not use field ids greater than 
2147483447 (`Integer.MAX_VALU
 
 The set of metadata columns is:
 
-| Field id, name              | Type          | Description |
-|-----------------------------|---------------|-------------|
-| **`2147483646  _file`**     | `string`      | Path of the file in which a 
row is stored |
-| **`2147483645  _pos`**      | `long`        | Ordinal position of a row in 
the source data file |
-| **`2147483644  _deleted`**  | `boolean`     | Whether the row has been 
deleted |
-| **`2147483643  _spec_id`**  | `int`         | Spec ID used to track the file 
containing a row |
-| **`2147483642  _partition`** | `struct`     | Partition to which a row 
belongs |
-| **`2147483546  file_path`** | `string`      | Path of a file, used in 
position-based delete files |
-| **`2147483545  pos`**       | `long`        | Ordinal position of a row, 
used in position-based delete files |
-| **`2147483544  row`**       | `struct<...>` | Deleted row values, used in 
position-based delete files |
+| Field id, name                   | Type          | Description               
                                                                              |
+|----------------------------------|---------------|---------------------------------------------------------------------------------------------------------|
+| **`2147483646  _file`**          | `string`      | Path of the file in which 
a row is stored                                                               |
+| **`2147483645  _pos`**           | `long`        | Ordinal position of a row 
in the source data file, starting at `0`                                      |
+| **`2147483644  _deleted`**       | `boolean`     | Whether the row has been 
deleted                                                                        |
+| **`2147483643  _spec_id`**       | `int`         | Spec ID used to track the 
file containing a row                                                         |
+| **`2147483642  _partition`**     | `struct`      | Partition to which a row 
belongs                                                                        |
+| **`2147483546  file_path`**      | `string`      | Path of a file, used in 
position-based delete files                                                     
|
+| **`2147483545  pos`**            | `long`        | Ordinal position of a 
row, used in position-based delete files                                        
  |
+| **`2147483544  row`**            | `struct<...>` | Deleted row values, used 
in position-based delete files                                                 |
+| **`2147483543  _row_id`**        | `long`        | A unique long assigned 
when row-lineage is enabled see [Row Lineage](#row-lineage)                     
 |
+| **`2147483542  _last_updated_seq`**   | `long`        | The sequence number 
which last updated this row when row-lineage is enabled [Row 
Lineage](#row-lineage) |
+
+### Row Lineage
+
+In v3 and later, an Iceberg table can track row lineage fields for all newly 
created rows.  Row lineage is enabled by setting the field `row-lineage` to 
true in the table's metadata. When enabled, engines must maintain the 
`next-row-id` table field and the following row-level fields when writing data 
files:
+
+* `_row_id` a unique long identifier for every row within the table. The value 
is assigned via inheritance when a row is first added to the table and the 
existing value is explicitly written when the row is written to a new file.
+* `_last_updated_seq` the sequence number of the commit that last updated a 
row. The value is inherited when a row is first added or modified and the 
existing value is explicitly written when the row is written to a different 
data file but not modified.
+
+These fields are assigned and updated by inheritance because the commit 
sequence number and starting row ID are not assigned until the snapshot is 
successfully committed. Inheritance is used to allow writing data and manifest 
files before values are known so that it is not necessary to rewrite data and 
manifest files when an optimistic commit is retried.
+
+When row lineage is enabled, new snapshots cannot include [Equality 
Deletes](#equality-delete-files). Row lineage is incompatible with equality 
deletes because lineage values must be maintained, but equality deletes are 
used to avoid reading existing data before writing changes.
+
+
+#### Row lineage assignment
+
+Row lineage fields are written when row lineage is enabled. When not enabled, 
row lineage fields (`_row_id` and `_last_updated_seq`) must not be written to 
data files. The rest of this section applies when row lineage is enabled.
+
+When a row is added or modified, the `_last_updated_seq` field is set to 
`null` so that it is inherited when reading. Similarly, the `_row_id` field for 
an added row is set to `null` and assigned when reading.
+
+A data file with only new rows for the table may omit the `_last_updated_seq` 
and `_row_id`. Files read without must be treated as if both fields are null 
for all rows.
+
+On read, if `_last_updated_seq` is `null` it is assigned the `sequence_number` 
of the data file's manifest entry. The data sequence number of a data file is 
documented in [Sequence Number Inheritance](#sequence-number-inheritance).
+
+When `null`, a row's `_row_id` field is assigned to the `first_row_id` from 
its containing data file plus the row position in that data file (`_pos`). A 
data file's `first_row_id` field is assigned using inheritance and is 
documented in [First Row ID Inheritance](#first-row-id-inheritance). A 
manifest's `first_row_id` is assigned when writing the manifest list for a 
snapshot and is documented in [First Row ID 
Assignment](#first-row-id-assignment). A snapshot's `first-row-id` is to the 
table's `next-row-id` and is documented in [Snapshot Row 
IDs](#snapshot-row-ids).
+
+Values for `_row_id` and `_last_updated_seq` are either read from the data 
file or assigned at read time. As a result on read, rows in a table always have 
non-null values for these fields when lineage is enabled.
+
+When an existing row is moved to a different data file for any reason, writers 
are required to write `_row_id` and `_last_updated_seq` according to the 
following rules:
+
+1. The row's existing non-null `_row_id` must be copied into the new data file
+2. If the write has modified the row, the `_last_updated_seq` field must be 
set to `null` (so that the modification's sequence number replaces the current 
value)
+3. If the write has not modified the row, the existing non-null 
`_last_updated_seq` value must be copied to the new data file
+
+
+#### Row lineage example
+
+This example demonstrates how `_row_id` and `_last_updated_seq` are assigned 
for a snapshot when row lineage is enabled. This starts with a table with row 
lineage enabled and a `next-row-id` of 1000.
+
+Writing a new append snapshot would create snapshot metadata with 
`first-row-id` assigned to the table's `next-row-id`:
+
+```json
+{
+  "operation": "append",
+  "first-row-id": 1000,
+  ...
+}
+```
+
+The snapshot's manifest list would contain existing manifests, plus new 
manifests with an assigned `first_row_id` based on the `added_rows_count` of 
previously listed added manifests:
+
+| `manifest_path` | `added_rows_count` | `existing_rows_count` | 
`first_row_id`     |
+|-----------------|--------------------|-----------------------|--------------------|
+| ...             | ...                | ...                   | ...           
     |
+| existing        | 75                 | 0                     | 925           
     |
+| added1          | 100                | 25                    | 1000          
     |
+| added2          | 0                  | 100                   | 1100          
     |
+| added3          | 125                | 25                    | 1100          
     |
+
+The first added file, `added1`, is assigned the same `first_row_id` as the 
snapshot and the following manifests are assigned `first_row_id` based on the 
number of rows added by the previously listed manifests. The second file, 
`added2`, does not change the `first_row_id` of the next manifest because it 
contains no added data files.
+
+Within the first `added`, the first added manifest, each data file' 
`first_row_id` follows a similar pattern:

Review Comment:
   ```suggestion
   Within `added1`, the first added manifest, each data file's `first_row_id` 
follows a similar pattern:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to