aokolnychyi commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1849434684
########## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ########## @@ -307,12 +331,22 @@ private void invalidateFilteredCache() { /** * @return a ManifestReader that is a filtered version of the input manifest. */ - private ManifestFile filterManifest(Schema tableSchema, ManifestFile manifest) { + private ManifestFile filterManifest( + Schema tableSchema, ManifestFile manifest, boolean trustReferencedManifests) { ManifestFile cached = filteredManifests.get(manifest); if (cached != null) { return cached; } + boolean manifestIsReferenced = manifestsReferencedForDeletes.contains(manifest.path()); + + // The manifest does not need to be rewritten if the referenced set can be trusted and the + // manifest is not referenced + if (trustReferencedManifests && !manifestIsReferenced) { Review Comment: I wonder whether we can restructure this a bit as there are separate branches that basically skip rewrites. What about having a common `canContainDeletedFiles` and just doing something like this? ``` if (!canContainDeletedFiles(manifest, trustManifestReferences)) { filteredManifests.put(manifest, manifest); return manifest; } try (ManifestReader<F> reader = newManifestReader(manifest)) { PartitionSpec spec = reader.spec(); PartitionAndMetricsEvaluator evaluator = new PartitionAndMetricsEvaluator(tableSchema, spec, deleteExpression); if (manifestHasDeletedFiles(evaluator, reader)) { return filterManifestWithDeletedFiles(evaluator, manifest, reader); } else { filteredManifests.put(manifest, manifest); return manifest; } } catch (IOException e) { throw new RuntimeIOException(e, "Failed to close manifest: %s", manifest); } ``` With helper methods: ``` private boolean canContainDeletedFiles(ManifestFile manifest, boolean trustManifestReferences) { if (hasNoLiveFiles(manifest)) { return false; } if (trustManifestReferences) { return manifestsWithDeletes.contains(manifest.path()); } return canContainDroppedFiles(manifest) || canContainExpressionDeletes(manifest) || canContainDroppedPartitions(manifest); } private boolean hasNoLiveFiles(ManifestFile manifest) { return !manifest.hasAddedFiles() && !manifest.hasExistingFiles(); } ``` And an extra check in `manifestHasDeletedFiles`: ``` private boolean manifestHasDeletedFiles( PartitionAndMetricsEvaluator evaluator, ManifestReader<F> reader) { if (manifestsWithDeletes.contains(reader.file().location())) { return true; } ... } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org