amogh-jahagirdar commented on code in PR #11495:
URL: https://github.com/apache/iceberg/pull/11495#discussion_r1835155795


##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -769,6 +794,82 @@ protected void validateDataFilesExist(
     }
   }
 
+  // validates there are no concurrently added DVs for referenced data files
+  protected void validateAddedDVs(
+      TableMetadata base,
+      Long startingSnapshotId,
+      Expression conflictDetectionFilter,
+      Snapshot parent) {
+    // skip if there is no current table state or table format doesn't support 
DVs
+    if (parent == null || base.formatVersion() < 3) {
+      return;
+    }
+
+    // skip if this operation doesn't add new DVs
+    Set<String> dvRefs = dvRefs();
+    if (dvRefs.isEmpty()) {
+      return;
+    }
+
+    Pair<List<ManifestFile>, Set<Long>> history =
+        validationHistory(
+            base,
+            startingSnapshotId,
+            VALIDATE_ADDED_DVS_OPERATIONS,
+            ManifestContent.DELETES,
+            parent);
+    List<ManifestFile> newDeleteManifests = history.first();
+    Set<Long> newSnapshotIds = history.second();
+
+    Tasks.foreach(newDeleteManifests)
+        .stopOnFailure()
+        .throwFailureWhenFinished()
+        .executeWith(workerPool())
+        .run(m -> validateAddedDVs(m, conflictDetectionFilter, newSnapshotIds, 
dvRefs));
+  }
+
+  private void validateAddedDVs(
+      ManifestFile manifest,
+      Expression conflictDetectionFilter,
+      Set<Long> newSnapshotIds,
+      Set<String> dvRefs) {
+    try (CloseableIterable<ManifestEntry<DeleteFile>> entries =
+        ManifestFiles.readDeleteManifest(manifest, ops.io(), 
ops.current().specsById())
+            .filterRows(conflictDetectionFilter)
+            .caseSensitive(caseSensitive)
+            .liveEntries()) {
+
+      for (ManifestEntry<DeleteFile> entry : entries) {
+        DeleteFile file = entry.file();
+        if (newSnapshotIds.contains(entry.snapshotId()) && 
ContentFileUtil.isDV(file)) {
+          ValidationException.check(
+              !dvRefs.contains(file.referencedDataFile()),
+              "Found concurrently added DV for %s: %s",
+              file.referencedDataFile(),
+              ContentFileUtil.dvDesc(file));
+        }
+      }
+    } catch (IOException e) {
+      throw new UncheckedIOException(e);
+    }
+  }
+
+  // builds a set of data file locations referenced by new DVs
+  private Set<String> dvRefs() {

Review Comment:
   Should we call this `referencedDataFiles`? 



##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -769,6 +794,82 @@ protected void validateDataFilesExist(
     }
   }
 
+  // validates there are no concurrently added DVs for referenced data files
+  protected void validateAddedDVs(
+      TableMetadata base,
+      Long startingSnapshotId,
+      Expression conflictDetectionFilter,
+      Snapshot parent) {
+    // skip if there is no current table state or table format doesn't support 
DVs
+    if (parent == null || base.formatVersion() < 3) {
+      return;
+    }
+
+    // skip if this operation doesn't add new DVs
+    Set<String> dvRefs = dvRefs();
+    if (dvRefs.isEmpty()) {
+      return;
+    }
+
+    Pair<List<ManifestFile>, Set<Long>> history =
+        validationHistory(
+            base,
+            startingSnapshotId,
+            VALIDATE_ADDED_DVS_OPERATIONS,
+            ManifestContent.DELETES,
+            parent);
+    List<ManifestFile> newDeleteManifests = history.first();
+    Set<Long> newSnapshotIds = history.second();
+
+    Tasks.foreach(newDeleteManifests)
+        .stopOnFailure()
+        .throwFailureWhenFinished()
+        .executeWith(workerPool())
+        .run(m -> validateAddedDVs(m, conflictDetectionFilter, newSnapshotIds, 
dvRefs));

Review Comment:
   Nit: Should we use the full word `manifest` in the lambda?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to