rdblue commented on code in PR #15006:
URL: https://github.com/apache/iceberg/pull/15006#discussion_r2825064995
##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -1065,9 +1062,34 @@ private List<ManifestFile> newDeleteFilesAsManifests() {
}
if (cachedNewDeleteManifests.isEmpty()) {
+ for (Map.Entry<String, List<DeleteFile>> entry :
dvsByReferencedFile.entrySet()) {
+ if (entry.getValue().size() > 1) {
+ LOG.warn(
+ "Attempted to commit {} duplicate DVs for data file {} in table
{}. "
+ + "Merging duplicates, and original DVs will be orphaned.",
Review Comment:
Will be orphaned? I think this is a bit strong of a warning and could lead
to incorrect issue reports.
It would be nice to detect whether the puffin file is actually orphaned by
checking whether it has other DVs. It's fine to leave a Puffin file in place if
it has live DVs from this commit. And it's fine to delete it if we see that all
of its DVs will be merged.
We probably don't need to do all that in this commit, though. I'd probably
leave this out and just have a warning about merging, without the part about
orphan DVs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]