smaheshwar-pltr commented on code in PR #16796:
URL: https://github.com/apache/iceberg/pull/16796#discussion_r3407195566
##########
core/src/main/java/org/apache/iceberg/CatalogUtil.java:
##########
@@ -590,19 +590,20 @@ public static void deleteRemovedMetadataFiles(
TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED_DEFAULT);
if (deleteAfterCommit) {
- Set<TableMetadata.MetadataLogEntry> removedPreviousMetadataFiles =
- Sets.newHashSet(base.previousFiles());
- // TableMetadata#addPreviousFile builds up the metadata log and uses
- // TableProperties.METADATA_PREVIOUS_VERSIONS_MAX to determine how many
files should stay in
- // the log, thus we don't include metadata.previousFiles() for deletion
- everything else can
- // be removed
- removedPreviousMetadataFiles.removeAll(metadata.previousFiles());
- deleteFiles(
- io,
- removedPreviousMetadataFiles.stream()
- .map(TableMetadata.MetadataLogEntry::file)
- .collect(Collectors.toSet()),
- "metadata");
+ // base tracks its own metadata file too, so when
previous-versions-max=0 the superseded
+ // file is deleted here rather than left as an orphan
+ Set<String> removedMetadataFiles = trackedMetadataFiles(base);
+ removedMetadataFiles.removeAll(trackedMetadataFiles(metadata));
+ deleteFiles(io, removedMetadataFiles, "metadata");
}
}
+
+ private static Set<String> trackedMetadataFiles(TableMetadata metadata) {
+ Set<String> files =
Review Comment:
I feel like generally use newHashSet, also can we stick closer to the
previous flow if possible?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]