rdblue commented on code in PR #6090:
URL: https://github.com/apache/iceberg/pull/6090#discussion_r1092367108


##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -1234,6 +1243,95 @@ public void 
testMultipleRefsAndCleanExpiredFilesFailsForIncrementalCleanup() {
                 .commit());
   }
 
+  @Test
+  public void testExpireWithStatisticsFiles() throws IOException {
+    table.newAppend().appendFile(FILE_A).commit();
+    String statsFileLocation1 = statsFileLocation(table.location());
+    StatisticsFile statisticsFile1 =
+        writeStatsFileForCurrentSnapshot(
+            table.currentSnapshot().snapshotId(),
+            table.currentSnapshot().sequenceNumber(),
+            statsFileLocation1,
+            table.io());
+    commitStats(table.newTransaction(), statisticsFile1);
+    Assert.assertEquals(
+        "Must match the latest snapshot",
+        table.currentSnapshot().snapshotId(),
+        statisticsFile1.snapshotId());
+
+    table.newAppend().appendFile(FILE_B).commit();
+    String statsFileLocation2 = statsFileLocation(table.location());
+    StatisticsFile statisticsFile2 =
+        writeStatsFileForCurrentSnapshot(
+            table.currentSnapshot().snapshotId(),
+            table.currentSnapshot().sequenceNumber(),
+            statsFileLocation2,
+            table.io());
+    commitStats(table.newTransaction(), statisticsFile2);
+    Assert.assertEquals(
+        "Must match the latest snapshot",
+        table.currentSnapshot().snapshotId(),
+        statisticsFile2.snapshotId());
+
+    Assert.assertEquals("Should have 2 statistics file", 2, 
table.statisticsFiles().size());
+
+    long tAfterCommits = 
waitUntilAfter(table.currentSnapshot().timestampMillis());
+    removeSnapshots(table).expireOlderThan(tAfterCommits).commit();
+
+    Assert.assertEquals("Should keep 1 snapshot", 1, 
Iterables.size(table.snapshots()));
+    Assertions.assertThat(table.statisticsFiles())
+        .hasSize(1)
+        .extracting(StatisticsFile::snapshotId)
+        .as("Should contain only the statistics file of snapshot2")
+        .isEqualTo(Lists.newArrayList(statisticsFile2.snapshotId()));
+
+    Assertions.assertThat(new File(statsFileLocation1).exists()).isFalse();
+    Assertions.assertThat(new File(statsFileLocation2).exists()).isTrue();
+  }
+
+  @Test
+  public void testExpireWithStatisticsFilesWithReuse() throws IOException {
+    table.newAppend().appendFile(FILE_A).commit();
+    String statsFileLocation1 = statsFileLocation(table.location());
+    StatisticsFile statisticsFile1 =
+        writeStatsFileForCurrentSnapshot(
+            table.currentSnapshot().snapshotId(),
+            table.currentSnapshot().sequenceNumber(),
+            statsFileLocation1,
+            table.io());
+    commitStats(table.newTransaction(), statisticsFile1);
+    Assert.assertEquals(
+        "Must match the latest snapshot",
+        table.currentSnapshot().snapshotId(),
+        statisticsFile1.snapshotId());
+
+    table.newAppend().appendFile(FILE_B).commit();
+    // Note: RewriteDataFiles can reuse statistics files across operations.
+    // This test reuses stats for append just to mimic this scenario without 
having to run
+    // RewriteDataFiles.

Review Comment:
   Does this actually happen in `RewriteDataFiles`? I don't think that the same 
stats file should be added more than once. It's a good idea to make sure it 
doesn't, but that should not be the behavior of built-in operations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to