amogh-jahagirdar commented on code in PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#discussion_r1815717333


##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -323,11 +345,15 @@ private ManifestFile filterManifest(Schema tableSchema, 
ManifestFile manifest) {
       PartitionSpec spec = reader.spec();
       PartitionAndMetricsEvaluator evaluator =
           new PartitionAndMetricsEvaluator(tableSchema, spec, 
deleteExpression);
+      boolean hasDeletedFiles = 
manifestsReferencedForDeletes.contains(manifest.path());
+      if (hasDeletedFiles) {
+        return filterManifestWithDeletedFiles(evaluator, manifest, reader);
+      }
 
       // this assumes that the manifest doesn't have files to remove and 
streams through the
       // manifest without copying data. if a manifest does have a file to 
remove, this will break
       // out of the loop and move on to filtering the manifest.
-      boolean hasDeletedFiles = manifestHasDeletedFiles(evaluator, reader);
+      hasDeletedFiles = manifestHasDeletedFiles(evaluator, reader);

Review Comment:
   Fixed!



##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -341,43 +375,36 @@ private ManifestFile filterManifest(Schema tableSchema, 
ManifestFile manifest) {
   }
 
   private boolean canContainDeletedFiles(ManifestFile manifest) {
-    boolean canContainExpressionDeletes;
+    return canContainDroppedFiles(manifest)
+        || canContainExpressionDeletes(manifest)
+        || canContainDroppedPartitions(manifest);
+  }
+
+  private boolean canContainExpressionDeletes(ManifestFile manifest) {
     if (deleteExpression != null && deleteExpression != 
Expressions.alwaysFalse()) {
       ManifestEvaluator manifestEvaluator =
           ManifestEvaluator.forRowFilter(
               deleteExpression, specsById.get(manifest.partitionSpecId()), 
caseSensitive);
-      canContainExpressionDeletes = manifestEvaluator.eval(manifest);
-    } else {
-      canContainExpressionDeletes = false;
+      return manifestEvaluator.eval(manifest);
     }
 
-    boolean canContainDroppedPartitions;
+    return false;
+  }
+
+  private boolean canContainDroppedPartitions(ManifestFile manifest) {
     if (!dropPartitions.isEmpty()) {
-      canContainDroppedPartitions =
-          ManifestFileUtil.canContainAny(manifest, dropPartitions, specsById);
-    } else {
-      canContainDroppedPartitions = false;
+      return ManifestFileUtil.canContainAny(manifest, dropPartitions, 
specsById);
     }
 
-    boolean canContainDroppedFiles;
+    return false;
+  }
+
+  private boolean canContainDroppedFiles(ManifestFile manifest) {
     if (!deletePaths.isEmpty()) {
-      canContainDroppedFiles = true;
-    } else if (!deleteFiles.isEmpty()) {
-      // because there were no path-only deletes, the set of deleted file 
partitions is valid
-      canContainDroppedFiles =
-          ManifestFileUtil.canContainAny(manifest, deleteFilePartitions, 
specsById);
+      return true;
     } else {
-      canContainDroppedFiles = false;
+      return ManifestFileUtil.canContainAny(manifest, deleteFilePartitions, 
specsById);
     }
-
-    boolean canContainDropBySeq =
-        manifest.content() == ManifestContent.DELETES
-            && manifest.minSequenceNumber() < minSequenceNumber;

Review Comment:
   We used to rewrite a given manifest anytime it had aged deletes regardless 
of if the manifest needed to get rewritten for other purposes. Discussed with 
@rdblue and the way to look at this is that we should opportunistically remove 
the aged deletes when the manifest needs to get rewritten for other purposes.
   
    Otherwise we don't really need to go out of our way since it can create 
unnecessary failures which I think I agree with the rationale on that. I 
removed `canContainDropSeq` as an individual condition to drive a rewrite, 
which is technically a behavior change but arguably a better choice based on 
the argument that it avoids unnecessary failures that can happen by doing that 
work.
   
   I also updated TestRowDelta#testDeleteDataFileWithDeleteFile so that we 
still exercise the case where aged deletes do get rewritten, but the conditions 
for triggering that had to change in the tests. 



##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -162,11 +178,13 @@ void delete(F file) {
   void delete(CharSequence path) {
     Preconditions.checkNotNull(path, "Cannot delete file path: null");
     invalidateFilteredCache();
+    this.allDeletesReferenceManifests = false;
     deletePaths.add(path);
   }
 
   boolean containsDeletes() {
-    return !deletePaths.isEmpty()
+    return !manifestsReferencedForDeletes.isEmpty()

Review Comment:
   I've removed this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to