rdblue commented on code in PR #10962:
URL: https://github.com/apache/iceberg/pull/10962#discussion_r1859372481


##########
core/src/test/java/org/apache/iceberg/TestRewriteFiles.java:
##########
@@ -384,6 +386,116 @@ public void testRewriteDataAndAssignOldSequenceNumber() {
     assertThat(listManifestFiles()).hasSize(4);
   }
 
+  @TestTemplate
+  public void 
testRewriteDataAndAssignOldSequenceNumbersShouldNotDropDeleteFiles() {
+    assumeThat(formatVersion)
+        .as("Sequence number is only supported in iceberg format v2 or later")
+        .isGreaterThan(1);
+    assertThat(listManifestFiles()).isEmpty();
+
+    commit(table, 
table.newRowDelta().addRows(FILE_A).addDeletes(FILE_A2_DELETES), branch);
+
+    long firstRewriteSequenceNumber = latestSnapshot(table, 
branch).sequenceNumber();
+
+    commit(
+        table,
+        
table.newRowDelta().addRows(FILE_B).addRows(FILE_B).addDeletes(FILE_B2_DELETES),
+        branch);
+    commit(
+        table,
+        
table.newRowDelta().addRows(FILE_B).addRows(FILE_C).addDeletes(FILE_C2_DELETES),
+        branch);
+
+    long secondRewriteSequenceNumber = latestSnapshot(table, 
branch).sequenceNumber();
+
+    commit(
+        table,
+        table
+            .newRewrite()
+            .addFile(FILE_D)
+            .deleteFile(FILE_B)
+            .deleteFile(FILE_C)
+            .dataSequenceNumber(secondRewriteSequenceNumber),
+        branch);
+
+    TableMetadata base = readMetadata();
+    Snapshot baseSnap = latestSnapshot(base, branch);
+    long baseSnapshotId = baseSnap.snapshotId();
+
+    Comparator<ManifestFile> sequenceNumberOrdering =
+        new Comparator<>() {
+          @Override
+          public int compare(ManifestFile o1, ManifestFile o2) {
+            return (int) (o1.sequenceNumber() - o2.sequenceNumber());
+          }
+        };
+
+    // FILE_B2_DELETES and FILE_A2_DELETES should not be removed as the 
rewrite specifies
+    // `firstRewriteSequenceNumber`
+    // explicitly which is the same as that of A2_DELETES and before B2_DELETES
+
+    // Technically A1_DELETES could be removed since it's an equality delete 
and should apply on
+    // data sequences strictly
+    // smaller, so it's no longer needed. However, MergingSnapshotProducer 
calls
+    // dropDeleteFilesOlderThan
+    // which doesn't consider if the file is an equality delete, if that API 
is changed the equality
+    // delete file could be dropped sooner
+    Snapshot pending =
+        apply(
+            table
+                .newRewrite()
+                .addFile(FILE_A2)
+                .deleteFile(FILE_A)
+                .dataSequenceNumber(firstRewriteSequenceNumber),
+            branch);
+
+    assertThat(pending.allManifests(table.io())).hasSize(6);
+
+    long pendingId = pending.snapshotId();
+    List<ManifestFile> manifestFiles =
+        pending.allManifests(table.io()).stream()
+            .sorted(sequenceNumberOrdering.reversed())
+            .collect(Collectors.toList());
+    ManifestFile newManifest = manifestFiles.get(0);
+    validateManifestEntries(newManifest, ids(pendingId), files(FILE_A2), 
statuses(ADDED));
+
+    assertThat(ManifestFiles.read(newManifest, FILE_IO).entries())
+        .allSatisfy(
+            entry -> 
assertThat(entry.dataSequenceNumber()).isEqualTo(firstRewriteSequenceNumber));
+    
assertThat(newManifest.sequenceNumber()).isEqualTo(secondRewriteSequenceNumber 
+ 2);

Review Comment:
   Why not use the equivalent of `validateDeleteManifest` below where you can 
specify the data and file sequence numbers that should be validated? I don't 
see why this needs to validate the entries manually.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to