amogh-jahagirdar commented on code in PR #17039:
URL: https://github.com/apache/iceberg/pull/17039#discussion_r3520874422
##########
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java:
##########
@@ -2069,6 +2069,36 @@ public void
testUnpartitionedRewriteDataFilesPreservesLineage() throws NoSuchTab
assertEquals("Rows must match", expectedRecordsWithLineage,
actualRecordsWithLineage);
}
+ @TestTemplate
Review Comment:
Feel free to keep your test name as is, just what i had locally
##########
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java:
##########
@@ -2069,6 +2069,36 @@ public void
testUnpartitionedRewriteDataFilesPreservesLineage() throws NoSuchTab
assertEquals("Rows must match", expectedRecordsWithLineage,
actualRecordsWithLineage);
}
+ @TestTemplate
Review Comment:
I think this looks right but I'd cleanup the empty append and do a more
realistic write, I'd also just use an explicit assertion on the records (just a
bit more readable/less opaque)
```
+ @TestTemplate
+ public void
testLastUpdatedSequenceInheritedFromDataSequenceAfterUpgrade() {
+ // Reproduces the inheritance bug for a v2-origin file carried into v3.
+ assumeThat(formatVersion).isEqualTo(2);
+
+ Table table = createTable();
+ writeRecords(2 /* files */, 4 /* records */);
+ table.refresh();
+ shouldHaveFiles(table, 2);
+ long originalSequenceNumber = table.currentSnapshot().sequenceNumber();
+ // Perform a compaction before upgrade which preserves the original
sequence number
+ basicRewrite(table).execute();
+ table.refresh();
+ shouldHaveFiles(table, 1);
+
+ DataFile compacted = Iterables.getOnlyElement(currentDataFiles(table));
+
assertThat(compacted.dataSequenceNumber()).isEqualTo(originalSequenceNumber);
+ assertThat(compacted.fileSequenceNumber())
+ .as("Compaction must bump the file sequence number above the
preserved data sequence")
+ .isGreaterThan(originalSequenceNumber);
+
+ // Upgrade to v3 so the row lineage metadata columns are exposed on
read.
+ table.updateProperties().set(TableProperties.FORMAT_VERSION,
"3").commit();
+ table.refresh();
+
+ writeRecords(1 /* files */, 4 /* records */);
+ table.refresh();
+ long appendSequenceNumber = table.currentSnapshot().sequenceNumber();
+
+ // currentDataWithLineage() projects (_row_id,
_last_updated_sequence_number, *) ordered by
+ // _row_id; the trailing data columns are irrelevant here, so match
them with ANY.
+ List<Object[]> actualLineage = currentDataWithLineage();
+
+ List<Object[]> expectedLineage =
+ Lists.newArrayList(
+ // The post-upgrade append writes a new file, which is assigned
the first row IDs (0-3)
+ // and carries the append commit's sequence number as its
last-updated sequence.
+ row(0L, appendSequenceNumber, ANY, ANY, ANY),
+ row(1L, appendSequenceNumber, ANY, ANY, ANY),
+ row(2L, appendSequenceNumber, ANY, ANY, ANY),
+ row(3L, appendSequenceNumber, ANY, ANY, ANY),
+ row(4L, originalSequenceNumber, ANY, ANY, ANY),
+ row(5L, originalSequenceNumber, ANY, ANY, ANY),
+ row(6L, originalSequenceNumber, ANY, ANY, ANY),
+ row(7L, originalSequenceNumber, ANY, ANY, ANY));
+
+ assertEquals(
+ "Carried-over rows must inherit the data sequence",
+ expectedLineage,
+ actualLineage);
+ }
+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]