RussellSpitzer commented on code in PR #11130:
URL: https://github.com/apache/iceberg/pull/11130#discussion_r1759578432
##########
format/spec.md:
##########
@@ -298,16 +298,137 @@ Iceberg tables must not use field ids greater than
2147483447 (`Integer.MAX_VALU
The set of metadata columns is:
-| Field id, name | Type | Description |
-|-----------------------------|---------------|-------------|
-| **`2147483646 _file`** | `string` | Path of the file in which a
row is stored |
-| **`2147483645 _pos`** | `long` | Ordinal position of a row in
the source data file |
-| **`2147483644 _deleted`** | `boolean` | Whether the row has been
deleted |
-| **`2147483643 _spec_id`** | `int` | Spec ID used to track the file
containing a row |
-| **`2147483642 _partition`** | `struct` | Partition to which a row
belongs |
-| **`2147483546 file_path`** | `string` | Path of a file, used in
position-based delete files |
-| **`2147483545 pos`** | `long` | Ordinal position of a row,
used in position-based delete files |
-| **`2147483544 row`** | `struct<...>` | Deleted row values, used in
position-based delete files |
+| Field id, name | Type | Description
|
+|-----------------------------------|---------------|-------------------------------------------------------------------------------|
+| **`2147483646 _file`** | `string` | Path of the file in
which a row is stored |
+| **`2147483645 _pos`** | `long` | Ordinal position of a
row in the source data file, starting at `0` |
+| **`2147483644 _deleted`** | `boolean` | Whether the row has been
deleted |
+| **`2147483643 _spec_id`** | `int` | Spec ID used to track
the file containing a row |
+| **`2147483642 _partition`** | `struct` | Partition to which a row
belongs |
+| **`2147483546 file_path`** | `string` | Path of a file, used in
position-based delete files |
+| **`2147483545 pos`** | `long` | Ordinal position of a
row, used in position-based delete files |
+| **`2147483544 row`** | `struct<...>` | Deleted row values, used
in position-based delete files |
+| **`2147483545 _row_identifier`** | `long` | A unique long assigned
when row-lineage is enabled see [Row Lineage](#row-lineage) |
+| **`2147483545 _last_update`** | `long` | The sequence number
which last updated this row when row-lineage is enabled [Row
Lineage](#row-lineage) |
+
+### Row Lineage
+
+In Specification V3, an Iceberg Table can declare that engines must track
row-lineage of all newly created rows. This
+requirement is controlled by setting the field `row-lineage` to true in the
table's metadata. When true, two additional
+fields in data files will be available for all rows added to the table.
+
+* `_row_identifier` a unique long for every row. Computed via inheritance for
rows in their original datafiles
Review Comment:
One possible alternative from the doc was to have this be a combination of a
random prefix and integer to remove the requirement of monotonic integer from
the metadata. Since we have other monotonic integers in the metadata, I think
this may not be that helpful unless we do a broad change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]