tomtongue opened a new pull request, #12307: URL: https://github.com/apache/iceberg/pull/12307
## Overview Fix the `row-lineage` table property reflection on `enableRowLineage`. ## Issue Currently to enable the Row Lineage feature from the Iceberg table properties, it's required to run the following operation: 1. Create an Iceberg table 2. Update table properties At the first step "Create an Iceberg table", even if you set `row-lineage` to `true` in the table properties, the property isn't reflected on the Iceberg table's metadata.json. Therefore, to enable that feature, you need to additionally run table properties update after creating an Iceberg table. ### Details #### Spark case When you create an Iceberg table using Spark like the following query, ``` spark.sql(""" CREATE TABLE db.rowlin (id int, name string, year int) USING iceberg TBLPROPERTIES ('format-version'='3', 'row-lineage'='true') LOCATION 's3://bucket/iceberg-v3/row-lineage' """) ``` The relevant metadata.json is stored in the specified bucket and path as below: ``` aws s3 ls s3://bucket/iceberg-v3/row-lineage/ --recursive 2025-02-18 16:56:28 1194 iceberg-v3/row-lineage/metadata/00000-1eb8c96e-f503-4ff9-b4e0-53cb3ede0116.metadata.json ``` At this point, the metadata content (partial) is below. The content doesn't have `row-lineage` even if the parameter is in the `properties` part. ```json { "format-version" : 3, "table-uuid" : "eaf5dec9-7866-49a5-81c6-11af8f344e1f", "location" : "s3://bucket/iceberg-v3/row-lineage", "last-sequence-number" : 0, "last-updated-ms" : 1739865386995, "last-column-id" : 3, "current-schema-id" : 0, "schemas" : [ { "type" : "struct", "schema-id" : 0, "fields" : [ { ... } ] } ], "default-spec-id" : 0, "partition-specs" : [ { "spec-id" : 0, "fields" : [ ] } ], "last-partition-id" : 999, "default-sort-order-id" : 0, "sort-orders" : [ { "order-id" : 0, "fields" : [ ] } ], "properties" : { "owner" : "hadoop", "write.update.mode" : "merge-on-read", "write.parquet.compression-codec" : "zstd", "row-lineage" : "true" }, "current-snapshot-id" : null, ... } ``` And then, update the table property by the same table property like `ALTER TABLE db.rowlin SET TBLPROPERTIES('row-lineage'= 'true')`. After the query is complete, the content of the new metadata.json is below. `row-lineage` and `next-row-id` is added. ```json { "format-version" : 3, "table-uuid" : "eaf5dec9-7866-49a5-81c6-11af8f344e1f", "location" : "s3://bucket/iceberg-v3/row-lineage", "last-sequence-number" : 0, "last-updated-ms" : 1739865514775, "last-column-id" : 3, "current-schema-id" : 0, "schemas" : [ { "type" : "struct", "schema-id" : 0, "fields" : [ { ... } ] } ], "default-spec-id" : 0, "partition-specs" : [ { "spec-id" : 0, "fields" : [ ] } ], "last-partition-id" : 999, "default-sort-order-id" : 0, "sort-orders" : [ { "order-id" : 0, "fields" : [ ] } ], "properties" : { "owner" : "hadoop", "write.update.mode" : "merge-on-read", "write.parquet.compression-codec" : "zstd", "row-lineage" : "true" }, "current-snapshot-id" : null, "row-lineage" : true, // <= ADDED "next-row-id" : 0, // <= ADDED "refs" : { }, "snapshots" : [ ], "statistics" : [ ], "partition-statistics" : [ ], "snapshot-log" : [ ], "metadata-log" : [ { "timestamp-ms" : 1739865386995, "metadata-file" : "s3://bucket/iceberg-v3/row-lineage/metadata/00000-1eb8c96e-f503-4ff9-b4e0-53cb3ede0116.metadata.json" } ] } ``` Here's the diff between two metadata files: ```diff $ diff 00000-1eb8c96e-f503-4ff9-b4e0-53cb3ede0116.metadata.json 00001-ebf641c8-9603-45d5-92c6-dafac315375e.metadata.json 6c6 < "last-updated-ms" : 1739865386995, --- > "last-updated-ms" : 1739865514775, 46a47,48 > "row-lineage" : true, > "next-row-id" : 0, 52c54,57 < "metadata-log" : [ ] --- > "metadata-log" : [ { > "timestamp-ms" : 1739865386995, > "metadata-file" : "s3://gsweep/iceberg-v3/row-lineage-mor13/metadata/00000-1eb8c96e-f503-4ff9-b4e0-53cb3ede0116.metadata.json" > } ] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org