rdblue commented on code in PR #11130:
URL: https://github.com/apache/iceberg/pull/11130#discussion_r1813670115


##########
format/spec.md:
##########
@@ -298,16 +298,101 @@ Iceberg tables must not use field ids greater than 
2147483447 (`Integer.MAX_VALU
 
 The set of metadata columns is:
 
-| Field id, name              | Type          | Description |
-|-----------------------------|---------------|-------------|
-| **`2147483646  _file`**     | `string`      | Path of the file in which a 
row is stored |
-| **`2147483645  _pos`**      | `long`        | Ordinal position of a row in 
the source data file |
-| **`2147483644  _deleted`**  | `boolean`     | Whether the row has been 
deleted |
-| **`2147483643  _spec_id`**  | `int`         | Spec ID used to track the file 
containing a row |
-| **`2147483642  _partition`** | `struct`     | Partition to which a row 
belongs |
-| **`2147483546  file_path`** | `string`      | Path of a file, used in 
position-based delete files |
-| **`2147483545  pos`**       | `long`        | Ordinal position of a row, 
used in position-based delete files |
-| **`2147483544  row`**       | `struct<...>` | Deleted row values, used in 
position-based delete files |
+| Field id, name                   | Type          | Description               
                                                                             |
+|----------------------------------|---------------|--------------------------------------------------------------------------------------------------------|
+| **`2147483646  _file`**          | `string`      | Path of the file in which 
a row is stored                                                              |
+| **`2147483645  _pos`**           | `long`        | Ordinal position of a row 
in the source data file, starting at `0`                                     |
+| **`2147483644  _deleted`**       | `boolean`     | Whether the row has been 
deleted                                                                       |
+| **`2147483643  _spec_id`**       | `int`         | Spec ID used to track the 
file containing a row                                                        |
+| **`2147483642  _partition`**     | `struct`      | Partition to which a row 
belongs                                                                       |
+| **`2147483546  file_path`**      | `string`      | Path of a file, used in 
position-based delete files                                                    |
+| **`2147483545  pos`**            | `long`        | Ordinal position of a 
row, used in position-based delete files                                        
 |
+| **`2147483544  row`**            | `struct<...>` | Deleted row values, used 
in position-based delete files                                                |
+| **`2147483543  _row_id`**        | `long`        | A unique long assigned 
when row-lineage is enabled, see [Row Lineage](#row-lineage)                    
|
+| **`2147483542  _last_updated_sequence_number`**   | `long`        | The 
sequence number which last updated this row when row-lineage is enabled [Row 
Lineage](#row-lineage) |
+
+### Row Lineage
+
+In v3 and later, an Iceberg table can track row lineage fields for all newly 
created rows.  Row lineage is enabled by setting the field `row-lineage` to 
true in the table's metadata. When enabled, engines must maintain the 
`next-row-id` table field and the following row-level fields when writing data 
files:
+
+* `_row_id` a unique long identifier for every row within the table. The value 
is assigned via inheritance when a row is first added to the table and the 
existing value is explicitly written when the row is written to a new file.

Review Comment:
   I think I disagree with the "copied" language. When "the row" is written 
(even if it is modified) the row ID should be preserved. To me, "copied" 
implies that the row is not modified, opening a question of how to handle row 
modification.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to