sumedhsakdeo commented on code in PR #11130: URL: https://github.com/apache/iceberg/pull/11130#discussion_r1811903184
########## format/spec.md: ########## @@ -298,16 +298,101 @@ Iceberg tables must not use field ids greater than 2147483447 (`Integer.MAX_VALU The set of metadata columns is: -| Field id, name | Type | Description | -|-----------------------------|---------------|-------------| -| **`2147483646 _file`** | `string` | Path of the file in which a row is stored | -| **`2147483645 _pos`** | `long` | Ordinal position of a row in the source data file | -| **`2147483644 _deleted`** | `boolean` | Whether the row has been deleted | -| **`2147483643 _spec_id`** | `int` | Spec ID used to track the file containing a row | -| **`2147483642 _partition`** | `struct` | Partition to which a row belongs | -| **`2147483546 file_path`** | `string` | Path of a file, used in position-based delete files | -| **`2147483545 pos`** | `long` | Ordinal position of a row, used in position-based delete files | -| **`2147483544 row`** | `struct<...>` | Deleted row values, used in position-based delete files | +| Field id, name | Type | Description | +|----------------------------------|---------------|--------------------------------------------------------------------------------------------------------| +| **`2147483646 _file`** | `string` | Path of the file in which a row is stored | +| **`2147483645 _pos`** | `long` | Ordinal position of a row in the source data file, starting at `0` | +| **`2147483644 _deleted`** | `boolean` | Whether the row has been deleted | +| **`2147483643 _spec_id`** | `int` | Spec ID used to track the file containing a row | +| **`2147483642 _partition`** | `struct` | Partition to which a row belongs | +| **`2147483546 file_path`** | `string` | Path of a file, used in position-based delete files | +| **`2147483545 pos`** | `long` | Ordinal position of a row, used in position-based delete files | +| **`2147483544 row`** | `struct<...>` | Deleted row values, used in position-based delete files | +| **`2147483543 _row_id`** | `long` | A unique long assigned when row-lineage is enabled, see [Row Lineage](#row-lineage) | +| **`2147483542 _last_updated_sequence_number`** | `long` | The sequence number which last updated this row when row-lineage is enabled [Row Lineage](#row-lineage) | + +### Row Lineage + +In v3 and later, an Iceberg table can track row lineage fields for all newly created rows. Row lineage is enabled by setting the field `row-lineage` to true in the table's metadata. When enabled, engines must maintain the `next-row-id` table field and the following row-level fields when writing data files: + +* `_row_id` a unique long identifier for every row within the table. The value is assigned via inheritance when a row is first added to the table and the existing value is explicitly written when the row is copied into a new file. +* `_last_updated_sequence_number` the sequence number of the commit that last updated a row. The value is inherited when a row is first added or modified and the existing value is explicitly written when the row is written to a different data file but not modified. + +These fields are assigned and updated by inheritance because the commit sequence number and starting row ID are not assigned until the snapshot is successfully committed. Inheritance is used to allow writing data and manifest files before values are known so that it is not necessary to rewrite data and manifest files when an optimistic commit is retried. + +When row lineage is enabled, new snapshots cannot include [Equality Deletes](#equality-delete-files). Row lineage is incompatible with equality deletes because lineage values must be maintained, but equality deletes are used to avoid reading existing data before writing changes. + + +#### Row lineage assignment + +Row lineage fields are written when row lineage is enabled. When not enabled, row lineage fields (`_row_id` and `_last_updated_sequence_number`) must not be written to data files. The rest of this section applies when row lineage is enabled. + +When a row is added or modified, the `_last_updated_sequence_number` field is set to `null` so that it is inherited when reading. Similarly, the `_row_id` field for an added row is set to `null` and assigned when reading. + +A data file with only new rows for the table may omit the `_last_updated_sequence_number` and `_row_id`. If the columns are missing, readers should treat both columns as if they exist and are set to null for all rows. Review Comment: >> readers should treat both columns as if they exist and are set to null for all rows Clarifying, if we are also saying here, that `_last_updated_sequence_number` and `_row_id` are reserved column names in a table created with v3 spec. ########## format/spec.md: ########## @@ -298,16 +298,101 @@ Iceberg tables must not use field ids greater than 2147483447 (`Integer.MAX_VALU The set of metadata columns is: -| Field id, name | Type | Description | -|-----------------------------|---------------|-------------| -| **`2147483646 _file`** | `string` | Path of the file in which a row is stored | -| **`2147483645 _pos`** | `long` | Ordinal position of a row in the source data file | -| **`2147483644 _deleted`** | `boolean` | Whether the row has been deleted | -| **`2147483643 _spec_id`** | `int` | Spec ID used to track the file containing a row | -| **`2147483642 _partition`** | `struct` | Partition to which a row belongs | -| **`2147483546 file_path`** | `string` | Path of a file, used in position-based delete files | -| **`2147483545 pos`** | `long` | Ordinal position of a row, used in position-based delete files | -| **`2147483544 row`** | `struct<...>` | Deleted row values, used in position-based delete files | +| Field id, name | Type | Description | +|----------------------------------|---------------|--------------------------------------------------------------------------------------------------------| +| **`2147483646 _file`** | `string` | Path of the file in which a row is stored | +| **`2147483645 _pos`** | `long` | Ordinal position of a row in the source data file, starting at `0` | +| **`2147483644 _deleted`** | `boolean` | Whether the row has been deleted | +| **`2147483643 _spec_id`** | `int` | Spec ID used to track the file containing a row | +| **`2147483642 _partition`** | `struct` | Partition to which a row belongs | +| **`2147483546 file_path`** | `string` | Path of a file, used in position-based delete files | +| **`2147483545 pos`** | `long` | Ordinal position of a row, used in position-based delete files | +| **`2147483544 row`** | `struct<...>` | Deleted row values, used in position-based delete files | +| **`2147483543 _row_id`** | `long` | A unique long assigned when row-lineage is enabled, see [Row Lineage](#row-lineage) | +| **`2147483542 _last_updated_sequence_number`** | `long` | The sequence number which last updated this row when row-lineage is enabled [Row Lineage](#row-lineage) | + +### Row Lineage + +In v3 and later, an Iceberg table can track row lineage fields for all newly created rows. Row lineage is enabled by setting the field `row-lineage` to true in the table's metadata. When enabled, engines must maintain the `next-row-id` table field and the following row-level fields when writing data files: + +* `_row_id` a unique long identifier for every row within the table. The value is assigned via inheritance when a row is first added to the table and the existing value is explicitly written when the row is copied into a new file. +* `_last_updated_sequence_number` the sequence number of the commit that last updated a row. The value is inherited when a row is first added or modified and the existing value is explicitly written when the row is written to a different data file but not modified. + +These fields are assigned and updated by inheritance because the commit sequence number and starting row ID are not assigned until the snapshot is successfully committed. Inheritance is used to allow writing data and manifest files before values are known so that it is not necessary to rewrite data and manifest files when an optimistic commit is retried. + +When row lineage is enabled, new snapshots cannot include [Equality Deletes](#equality-delete-files). Row lineage is incompatible with equality deletes because lineage values must be maintained, but equality deletes are used to avoid reading existing data before writing changes. + + +#### Row lineage assignment + +Row lineage fields are written when row lineage is enabled. When not enabled, row lineage fields (`_row_id` and `_last_updated_sequence_number`) must not be written to data files. The rest of this section applies when row lineage is enabled. + +When a row is added or modified, the `_last_updated_sequence_number` field is set to `null` so that it is inherited when reading. Similarly, the `_row_id` field for an added row is set to `null` and assigned when reading. + +A data file with only new rows for the table may omit the `_last_updated_sequence_number` and `_row_id`. If the columns are missing, readers should treat both columns as if they exist and are set to null for all rows. + +On read, if `_last_updated_sequence_number` is `null` it is assigned the `sequence_number` of the data file's manifest entry. The data sequence number of a data file is documented in [Sequence Number Inheritance](#sequence-number-inheritance). + +When `null`, a row's `_row_id` field is assigned to the `first_row_id` from its containing data file plus the row position in that data file (`_pos`). A data file's `first_row_id` field is assigned using inheritance and is documented in [First Row ID Inheritance](#first-row-id-inheritance). A manifest's `first_row_id` is assigned when writing the manifest list for a snapshot and is documented in [First Row ID Assignment](#first-row-id-assignment). A snapshot's `first-row-id` is set to the table's `next-row-id` and is documented in [Snapshot Row IDs](#snapshot-row-ids). + +Values for `_row_id` and `_last_updated_sequence_number` are either read from the data file or assigned at read time. As a result on read, rows in a table always have non-null values for these fields when lineage is enabled. + +When an existing row is moved to a different data file for any reason, writers are required to write `_row_id` and `_last_updated_sequence_number` according to the following rules: Review Comment: When a user does INSERT OVERWRITE of an entire partition / table, some rows might be overwritten implicitly, as the operation is not copy-on-write, unlike MERGE INTO or UPDATE. For such cases, are we saying the rows are treated as new rows, and existing row _row_id or _last_updated_sequence_number will not be carried forward? ########## format/spec.md: ########## @@ -298,16 +298,101 @@ Iceberg tables must not use field ids greater than 2147483447 (`Integer.MAX_VALU The set of metadata columns is: -| Field id, name | Type | Description | -|-----------------------------|---------------|-------------| -| **`2147483646 _file`** | `string` | Path of the file in which a row is stored | -| **`2147483645 _pos`** | `long` | Ordinal position of a row in the source data file | -| **`2147483644 _deleted`** | `boolean` | Whether the row has been deleted | -| **`2147483643 _spec_id`** | `int` | Spec ID used to track the file containing a row | -| **`2147483642 _partition`** | `struct` | Partition to which a row belongs | -| **`2147483546 file_path`** | `string` | Path of a file, used in position-based delete files | -| **`2147483545 pos`** | `long` | Ordinal position of a row, used in position-based delete files | -| **`2147483544 row`** | `struct<...>` | Deleted row values, used in position-based delete files | +| Field id, name | Type | Description | +|----------------------------------|---------------|--------------------------------------------------------------------------------------------------------| +| **`2147483646 _file`** | `string` | Path of the file in which a row is stored | +| **`2147483645 _pos`** | `long` | Ordinal position of a row in the source data file, starting at `0` | +| **`2147483644 _deleted`** | `boolean` | Whether the row has been deleted | +| **`2147483643 _spec_id`** | `int` | Spec ID used to track the file containing a row | +| **`2147483642 _partition`** | `struct` | Partition to which a row belongs | +| **`2147483546 file_path`** | `string` | Path of a file, used in position-based delete files | +| **`2147483545 pos`** | `long` | Ordinal position of a row, used in position-based delete files | +| **`2147483544 row`** | `struct<...>` | Deleted row values, used in position-based delete files | +| **`2147483543 _row_id`** | `long` | A unique long assigned when row-lineage is enabled, see [Row Lineage](#row-lineage) | +| **`2147483542 _last_updated_sequence_number`** | `long` | The sequence number which last updated this row when row-lineage is enabled [Row Lineage](#row-lineage) | + +### Row Lineage + +In v3 and later, an Iceberg table can track row lineage fields for all newly created rows. Row lineage is enabled by setting the field `row-lineage` to true in the table's metadata. When enabled, engines must maintain the `next-row-id` table field and the following row-level fields when writing data files: + +* `_row_id` a unique long identifier for every row within the table. The value is assigned via inheritance when a row is first added to the table and the existing value is explicitly written when the row is copied into a new file. +* `_last_updated_sequence_number` the sequence number of the commit that last updated a row. The value is inherited when a row is first added or modified and the existing value is explicitly written when the row is written to a different data file but not modified. + +These fields are assigned and updated by inheritance because the commit sequence number and starting row ID are not assigned until the snapshot is successfully committed. Inheritance is used to allow writing data and manifest files before values are known so that it is not necessary to rewrite data and manifest files when an optimistic commit is retried. + +When row lineage is enabled, new snapshots cannot include [Equality Deletes](#equality-delete-files). Row lineage is incompatible with equality deletes because lineage values must be maintained, but equality deletes are used to avoid reading existing data before writing changes. + + +#### Row lineage assignment + +Row lineage fields are written when row lineage is enabled. When not enabled, row lineage fields (`_row_id` and `_last_updated_sequence_number`) must not be written to data files. The rest of this section applies when row lineage is enabled. + +When a row is added or modified, the `_last_updated_sequence_number` field is set to `null` so that it is inherited when reading. Similarly, the `_row_id` field for an added row is set to `null` and assigned when reading. + +A data file with only new rows for the table may omit the `_last_updated_sequence_number` and `_row_id`. If the columns are missing, readers should treat both columns as if they exist and are set to null for all rows. + +On read, if `_last_updated_sequence_number` is `null` it is assigned the `sequence_number` of the data file's manifest entry. The data sequence number of a data file is documented in [Sequence Number Inheritance](#sequence-number-inheritance). + +When `null`, a row's `_row_id` field is assigned to the `first_row_id` from its containing data file plus the row position in that data file (`_pos`). A data file's `first_row_id` field is assigned using inheritance and is documented in [First Row ID Inheritance](#first-row-id-inheritance). A manifest's `first_row_id` is assigned when writing the manifest list for a snapshot and is documented in [First Row ID Assignment](#first-row-id-assignment). A snapshot's `first-row-id` is set to the table's `next-row-id` and is documented in [Snapshot Row IDs](#snapshot-row-ids). + +Values for `_row_id` and `_last_updated_sequence_number` are either read from the data file or assigned at read time. As a result on read, rows in a table always have non-null values for these fields when lineage is enabled. + +When an existing row is moved to a different data file for any reason, writers are required to write `_row_id` and `_last_updated_sequence_number` according to the following rules: + +1. The row's existing non-null `_row_id` must be copied into the new data file +2. If the write has modified the row, the `_last_updated_sequence_number` field must be set to `null` (so that the modification's sequence number replaces the current value) +3. If the write has not modified the row, the existing non-null `_last_updated_sequence_number` value must be copied to the new data file + + +#### Row lineage example + +This example demonstrates how `_row_id` and `_last_updated_sequence_number` are assigned for a snapshot when row lineage is enabled. This starts with a table with row lineage enabled and a `next-row-id` of 1000. + +Writing a new append snapshot would create snapshot metadata with `first-row-id` assigned to the table's `next-row-id`: + +```json +{ + "operation": "append", + "first-row-id": 1000, + ... +} +``` + +The snapshot's manifest list would contain existing manifests, plus new manifests with an assigned `first_row_id` based on the `added_rows_count` of previously listed added manifests: + +| `manifest_path` | `added_rows_count` | `existing_rows_count` | `first_row_id` | +|-----------------|--------------------|-----------------------|--------------------| +| ... | ... | ... | ... | +| existing | 75 | 0 | 925 | +| added1 | 100 | 25 | 1000 | +| added2 | 0 | 100 | 1100 | +| added3 | 125 | 25 | 1100 | + +The first added file, `added1`, is assigned the same `first_row_id` as the snapshot and the following manifests are assigned `first_row_id` based on the number of rows added by the previously listed manifests. The second file, `added2`, does not change the `first_row_id` of the next manifest because it contains no added data files. + +Within `added1`, the first added manifest, each data file's `first_row_id` follows a similar pattern: + +| `status` | `file_path` | `record_count` | `first_row_id` | +|----------|-------------|----------------|----------------| +| EXISTING | data1 | 25 | 800 | +| ADDED | data2 | 50 | null (1000) | +| ADDED | data3 | 50 | null (1050) | + +The `first_row_id` of the EXISTING file `data1` was already assigned, so the file metadata was copied into manifest `added1`. + +Files `data2` and `data3` are written with `null` for `first_row_id` and are assigned `first_row_id` at read time based on the manifest's `first_row_id` and the `record_count` of previously listed ADDED files in this manifest: (1,000 + 0) and (1,000 + 50). + +When the new snapshot is committed, the table's `next-row-id` must also be updated (even if the new snapshot is not in the main branch). Because 225 rows were added (`added1`: 100 + `added2`: 0 + `added3`: 125), the new value is 1,000 + 225 = 1,225: + + +### Enabling Row Lineage for Non-empty Tables + +Any snapshot without the field `first-row-id` does not have any lineage information and values for `_row_id` and `_last_updated_sequence_number` cannot be assigned accurately. + +All files that were added before `row-lineage` was enabled should propagate null for all of the `row-lineage` related +fields. The values for `_row_id` and `_last_updated_sequence_number` should always return null and when these rows are copied, +null should be explicitly written. After this point, rows are treated as if they were just created +and assigned `row_id` and `_last_updated_sequence_number` as if they were new rows. Review Comment: For completeness, should we add line on expected behavior if disabling row lineage after enabling it for some time. ########## format/spec.md: ########## @@ -298,16 +298,102 @@ Iceberg tables must not use field ids greater than 2147483447 (`Integer.MAX_VALU The set of metadata columns is: -| Field id, name | Type | Description | -|-----------------------------|---------------|-------------| -| **`2147483646 _file`** | `string` | Path of the file in which a row is stored | -| **`2147483645 _pos`** | `long` | Ordinal position of a row in the source data file | -| **`2147483644 _deleted`** | `boolean` | Whether the row has been deleted | -| **`2147483643 _spec_id`** | `int` | Spec ID used to track the file containing a row | -| **`2147483642 _partition`** | `struct` | Partition to which a row belongs | -| **`2147483546 file_path`** | `string` | Path of a file, used in position-based delete files | -| **`2147483545 pos`** | `long` | Ordinal position of a row, used in position-based delete files | -| **`2147483544 row`** | `struct<...>` | Deleted row values, used in position-based delete files | +| Field id, name | Type | Description | +|----------------------------------|---------------|---------------------------------------------------------------------------------------------------------| +| **`2147483646 _file`** | `string` | Path of the file in which a row is stored | +| **`2147483645 _pos`** | `long` | Ordinal position of a row in the source data file, starting at `0` | +| **`2147483644 _deleted`** | `boolean` | Whether the row has been deleted | +| **`2147483643 _spec_id`** | `int` | Spec ID used to track the file containing a row | +| **`2147483642 _partition`** | `struct` | Partition to which a row belongs | +| **`2147483546 file_path`** | `string` | Path of a file, used in position-based delete files | +| **`2147483545 pos`** | `long` | Ordinal position of a row, used in position-based delete files | +| **`2147483544 row`** | `struct<...>` | Deleted row values, used in position-based delete files | +| **`2147483543 _row_id`** | `long` | A unique long assigned when row-lineage is enabled see [Row Lineage](#row-lineage) | +| **`2147483542 _last_updated_seq`** | `long` | The sequence number which last updated this row when row-lineage is enabled [Row Lineage](#row-lineage) | + +### Row Lineage + +In v3 and later, an Iceberg table can track row lineage fields for all newly created rows. Row lineage is enabled by setting the field `row-lineage` to true in the table's metadata. When enabled, engines must maintain the `next-row-id` table field and the following row-level fields when writing data files: + +* `_row_id` a unique long identifier for every row within the table. The value is assigned via inheritance when a row is first added to the table and the existing value is explicitly written when the row is written to a new file. +* `_last_updated_seq` the sequence number of the commit that last updated a row. The value is inherited when a row is first added or modified and the existing value is explicitly written when the row is written to a different data file but not modified. + +These fields are assigned and updated by inheritance because the commit sequence number and starting row ID are not assigned until the snapshot is successfully committed. Inheritance is used to allow writing data and manifest files before values are known so that it is not necessary to rewrite data and manifest files when an optimistic commit is retried. + +When row lineage is enabled, new snapshots cannot include [Equality Deletes](#equality-delete-files). Row lineage is incompatible with equality deletes because lineage values must be maintained, but equality deletes are used to avoid reading existing data before writing changes. + + +#### Row lineage assignment + +Row lineage fields are written when row lineage is enabled. When not enabled, row lineage fields (`_row_id` and `_last_updated_seq`) must not be written to data files. The rest of this section applies when row lineage is enabled. + +When a row is added or modified, the `_last_updated_seq` field is set to `null` so that it is inherited when reading. Similarly, the `_row_id` field for an added row is set to `null` and assigned when reading. + +A data file with only new rows for the table may omit the `_last_updated_seq` and `_row_id`. Files read without must be treated as if both fields are null for all rows. + +On read, if `_last_updated_seq` is `null` it is assigned the `sequence_number` of the data file's manifest entry. The data sequence number of a data file is documented in [Sequence Number Inheritance](#sequence-number-inheritance). + +When `null`, a row's `_row_id` field is assigned to the `first_row_id` from its containing data file plus the row position in that data file (`_pos`). A data file's `first_row_id` field is assigned using inheritance and is documented in [First Row ID Inheritance](#first-row-id-inheritance). A manifest's `first_row_id` is assigned when writing the manifest list for a snapshot and is documented in [First Row ID Assignment](#first-row-id-assignment). A snapshot's `first-row-id` is to the table's `next-row-id` and is documented in [Snapshot Row IDs](#snapshot-row-ids). + +Values for `_row_id` and `_last_updated_seq` are either read from the data file or assigned at read time. As a result on read, rows in a table always have non-null values for these fields when lineage is enabled. + +When an existing row is moved to a different data file for any reason, writers are required to write `_row_id` and `_last_updated_seq` according to the following rules: + +1. The row's existing non-null `_row_id` must be copied into the new data file +2. If the write has modified the row, the `_last_updated_seq` field must be set to `null` (so that the modification's sequence number replaces the current value) +3. If the write has not modified the row, the existing non-null `_last_updated_seq` value must be copied to the new data file + + +#### Row lineage example + +This example demonstrates how `_row_id` and `_last_updated_seq` are assigned for a snapshot when row lineage is enabled. This starts with a table with row lineage enabled and a `next-row-id` of 1000. + +Writing a new append snapshot would create snapshot metadata with `first-row-id` assigned to the table's `next-row-id`: + +```json +{ + "operation": "append", + "first-row-id": 1000, + ... +} +``` + +The snapshot's manifest list would contain existing manifests, plus new manifests with an assigned `first_row_id` based on the `added_rows_count` of previously listed added manifests: + +| `manifest_path` | `added_rows_count` | `existing_rows_count` | `first_row_id` | +|-----------------|--------------------|-----------------------|--------------------| +| ... | ... | ... | ... | +| existing | 75 | 0 | 925 | +| added1 | 100 | 25 | 1000 | +| added2 | 0 | 100 | 1100 | +| added3 | 125 | 25 | 1100 | + +The first added file, `added1`, is assigned the same `first_row_id` as the snapshot and the following manifests are assigned `first_row_id` based on the number of rows added by the previously listed manifests. The second file, `added2`, does not change the `first_row_id` of the next manifest because it contains no added data files. + +Within the first `added`, the first added manifest, each data file' `first_row_id` follows a similar pattern: + +| `status` | `file_path` | `record_count` | `first_row_id` | +|----------|-------------|----------------|----------------| +| EXISTING | data1 | 75 | 800 | +| ADDED | data2 | 50 | null (1000) | +| ADDED | data3 | 50 | null (1050) | + +The `first_row_id` of the EXISTING file `data1` was already assigned, so the file metadata was copied into manifest `added1`. + +Files `data2` and `data3` are written with `null` for `first_row_id` and are assigned `first_row_id` at read time based on the manifest's `first_row_id` and the `record_count` of previously listed ADDED files in this manifest: (1,000 + 0) and (1,000 + 50). + +When the new snapshot is committed, the table's `next-row-id` must also be updated (even if the new snapshot is not in the main branch). Because 225 rows were added (`added1`: 100 + `added2`: 0 + `added3`: 125), the new value is 1,000 + 225 = 1,225: Review Comment: +1, examples made it easy to follow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org