aokolnychyi commented on code in PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#discussion_r1849434684


##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -307,12 +331,22 @@ private void invalidateFilteredCache() {
   /**
    * @return a ManifestReader that is a filtered version of the input manifest.
    */
-  private ManifestFile filterManifest(Schema tableSchema, ManifestFile 
manifest) {
+  private ManifestFile filterManifest(
+      Schema tableSchema, ManifestFile manifest, boolean 
trustReferencedManifests) {
     ManifestFile cached = filteredManifests.get(manifest);
     if (cached != null) {
       return cached;
     }
 
+    boolean manifestIsReferenced = 
manifestsReferencedForDeletes.contains(manifest.path());
+
+    // The manifest does not need to be rewritten if the referenced set can be 
trusted and the
+    // manifest is not referenced
+    if (trustReferencedManifests && !manifestIsReferenced) {

Review Comment:
   I wonder whether we can restructure this a bit as there are separate 
branches that basically skip rewrites. What about having a common 
`canContainDeletedFiles` and just doing something like this?
   
   ```
   if (!canContainDeletedFiles(manifest, trustManifestReferences)) {
     filteredManifests.put(manifest, manifest);
     return manifest;
   }
   
   try (ManifestReader<F> reader = newManifestReader(manifest)) {
     PartitionSpec spec = reader.spec();
     PartitionAndMetricsEvaluator evaluator =
         new PartitionAndMetricsEvaluator(tableSchema, spec, deleteExpression);
     if (manifestHasDeletedFiles(evaluator, reader)) {
       return filterManifestWithDeletedFiles(evaluator, manifest, reader);
     } else {
       filteredManifests.put(manifest, manifest);
       return manifest;
     }
   } catch (IOException e) {
     throw new RuntimeIOException(e, "Failed to close manifest: %s", manifest);
   }
   ```
   
   With helper methods:
   
   ```
   private boolean canContainDeletedFiles(ManifestFile manifest, boolean 
trustManifestReferences) {
     if (hasNoLiveFiles(manifest)) {
       return false;
     }
   
     if (trustManifestReferences) {
       return manifestsWithDeletes.contains(manifest.path());
     }
   
     return canContainDroppedFiles(manifest)
         || canContainExpressionDeletes(manifest)
         || canContainDroppedPartitions(manifest);
   }
   
   private boolean hasNoLiveFiles(ManifestFile manifest) {
     return !manifest.hasAddedFiles() && !manifest.hasExistingFiles();
   }
   ```
   
   And an extra check in `manifestHasDeletedFiles`:
   
   ```
     private boolean manifestHasDeletedFiles(
         PartitionAndMetricsEvaluator evaluator, ManifestReader<F> reader) {
       if (manifestsWithDeletes.contains(reader.file().location())) {
         return true;
       }
   
     ...
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to