RussellSpitzer commented on code in PR #12672:
URL: https://github.com/apache/iceberg/pull/12672#discussion_r2037967837


##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -283,11 +285,26 @@ public Snapshot apply() {
       throw new RuntimeIOException(e, "Failed to write manifest list file");
     }
 
-    Long addedRows = null;
-    Long lastRowId = null;
-    if (base.rowLineageEnabled()) {
-      addedRows = calculateAddedRows(manifests);
-      lastRowId = base.nextRowId();
+    Long assignedRows = null;
+    if (base.formatVersion() >= 3) {
+      assignedRows = writer.nextRowId() - base.nextRowId();

Review Comment:
   I'm struggling now thinking about branching situations I think we may end up 
assigning different id's to the same rows if they exist in multiple branches. 
This may be ok though.
   
   If we are ok with that we could just go to every snapshot without any 
parents. We treat all rows in that snapshot as having been added in that 
snapshot, this means all children of that snapshot start at least at the 
total-row count of the snapshot
   
   For example
   
   [Main 0] - > [Main 1] -> [Main 2]
                  ->  [Alt 1]
   
   [Branch 0] -> [Branch1] 
   
   We start with a last-row-id 0;
   
   First we detect leaves
   Main 0 and Branch 0.
   
   For each of these we get the total number of records in the snapshot.
   So assuming Main 0 has 100 total rows. We set last-row-id to 100 and then 
visit all of the children of main 0. 
   
   Visit main 1, set first-row-id to 100, add added rows to last-row-id (100 + 
50)
   Visit main 2, set first-row-id to 150, add added row to last row id (150 + 
25)
   visit alt 1, set first row id to 175, add added rows to last row id (175 + 
25)
   
   visit branch 0, set first row id to 200, add **total row count** to 
last-row-id (200 + 50)
   visit branch 1, set first row id to 250, add added rows to last row id (250 
+ 10)
   
   Finally set metadata.json last-row-id = 260



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to