RussellSpitzer commented on code in PR #12672: URL: https://github.com/apache/iceberg/pull/12672#discussion_r2037967837
########## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ########## @@ -283,11 +285,26 @@ public Snapshot apply() { throw new RuntimeIOException(e, "Failed to write manifest list file"); } - Long addedRows = null; - Long lastRowId = null; - if (base.rowLineageEnabled()) { - addedRows = calculateAddedRows(manifests); - lastRowId = base.nextRowId(); + Long assignedRows = null; + if (base.formatVersion() >= 3) { + assignedRows = writer.nextRowId() - base.nextRowId(); Review Comment: I'm struggling now thinking about branching situations I think we may end up assigning different id's to the same rows if they exist in multiple branches. This may be ok though. If we are ok with that we could just go to every snapshot without any parents. We treat all rows in that snapshot as having been added in that snapshot, this means all children of that snapshot start at least at the total-row count of the snapshot For example [Main 0] - > [Main 1] -> [Main 2] -> [Alt 1] [Branch 0] -> [Branch1] We start with a last-row-id 0; First we detect leaves Main 0 and Branch 0. For each of these we get the total number of records in the snapshot. So assuming Main 0 has 100 total rows. We set last-row-id to 100 and then visit all of the children of main 0. Visit main 1, set first-row-id to 100, add added rows to last-row-id (100 + 50) Visit main 2, set first-row-id to 150, add added row to last row id (150 + 25) visit alt 1, set first row id to 175, add added rows to last row id (175 + 25) visit branch 0, set first row id to 200, add **total row count** to last-row-id (200 + 50) visit branch 1, set first row id to 250, add added rows to last row id (250 + 10) Finally set metadata.json last-row-id = 260 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org