amogh-jahagirdar commented on code in PR #15653:
URL: https://github.com/apache/iceberg/pull/15653#discussion_r2991663581


##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -866,6 +871,38 @@ private void validateAddedDVs(
     }
   }
 
+  private Iterable<ManifestFile> filterManifestsByPartition(
+      TableMetadata base, Expression conflictDetectionFilter, 
List<ManifestFile> manifests) {
+    if (conflictDetectionFilter == null || conflictDetectionFilter == 
Expressions.alwaysTrue()) {
+      return manifests;
+    }
+
+    // if any concurrent manifest was written with a different partition spec, 
skip pruning
+    // to avoid incorrectly excluding manifests when a spec change happened 
during validation
+    int defaultSpecId = base.defaultSpecId();
+    if (manifests.stream().anyMatch(m -> m.partitionSpecId() != 
defaultSpecId)) {
+      return manifests;
+    }
+
+    Map<Integer, PartitionSpec> specsById = base.specsById();
+    Map<Integer, ManifestEvaluator> evaluators = Maps.newHashMap();
+    return Iterables.filter(

Review Comment:
   Should this also filter where `manifest.hasAddedFiles()`? That way we don't 
have to bother opening up manifests which were just a result of a metadata 
compaction (only existing files in those). If so, may want to also rename the 
method/pass in an additional predicate<manifest> or something. 



##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -836,7 +838,10 @@ protected void validateAddedDVs(
     List<ManifestFile> newDeleteManifests = history.first();
     Set<Long> newSnapshotIds = history.second();
 
-    Tasks.foreach(newDeleteManifests)
+    Iterable<ManifestFile> matchingManifests =

Review Comment:
   As a side note, I think there are cases where we can just merge the 
conflicting added DVs rather than fail; we can take that on a follow on, since 
this PR is an improvement over the current behavior. Just mentioning it since 
that may impact how we want to structure all this logic.



##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -836,7 +838,10 @@ protected void validateAddedDVs(
     List<ManifestFile> newDeleteManifests = history.first();
     Set<Long> newSnapshotIds = history.second();
 
-    Tasks.foreach(newDeleteManifests)
+    Iterable<ManifestFile> matchingManifests =

Review Comment:
   It's a bit unfortunate ManifestGroup is tied to data files, it does expose a 
lot of the primitives we need so we don't have to re-implement them.



##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -866,6 +871,38 @@ private void validateAddedDVs(
     }
   }
 
+  private Iterable<ManifestFile> filterManifestsByPartition(
+      TableMetadata base, Expression conflictDetectionFilter, 
List<ManifestFile> manifests) {
+    if (conflictDetectionFilter == null || conflictDetectionFilter == 
Expressions.alwaysTrue()) {
+      return manifests;
+    }
+
+    // if any concurrent manifest was written with a different partition spec, 
skip pruning
+    // to avoid incorrectly excluding manifests when a spec change happened 
during validation
+    int defaultSpecId = base.defaultSpecId();
+    if (manifests.stream().anyMatch(m -> m.partitionSpecId() != 
defaultSpecId)) {
+      return manifests;
+    }
+
+    Map<Integer, PartitionSpec> specsById = base.specsById();
+    Map<Integer, ManifestEvaluator> evaluators = Maps.newHashMap();
+    return Iterables.filter(

Review Comment:
   Or just keep it as is and add the filtering after the invocation of 
filterManifestsByPartition, that's probably cleaner separation of concerns



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to