amogh-jahagirdar commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1815717333
########## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ########## @@ -323,11 +345,15 @@ private ManifestFile filterManifest(Schema tableSchema, ManifestFile manifest) { PartitionSpec spec = reader.spec(); PartitionAndMetricsEvaluator evaluator = new PartitionAndMetricsEvaluator(tableSchema, spec, deleteExpression); + boolean hasDeletedFiles = manifestsReferencedForDeletes.contains(manifest.path()); + if (hasDeletedFiles) { + return filterManifestWithDeletedFiles(evaluator, manifest, reader); + } // this assumes that the manifest doesn't have files to remove and streams through the // manifest without copying data. if a manifest does have a file to remove, this will break // out of the loop and move on to filtering the manifest. - boolean hasDeletedFiles = manifestHasDeletedFiles(evaluator, reader); + hasDeletedFiles = manifestHasDeletedFiles(evaluator, reader); Review Comment: Fixed! ########## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ########## @@ -341,43 +375,36 @@ private ManifestFile filterManifest(Schema tableSchema, ManifestFile manifest) { } private boolean canContainDeletedFiles(ManifestFile manifest) { - boolean canContainExpressionDeletes; + return canContainDroppedFiles(manifest) + || canContainExpressionDeletes(manifest) + || canContainDroppedPartitions(manifest); + } + + private boolean canContainExpressionDeletes(ManifestFile manifest) { if (deleteExpression != null && deleteExpression != Expressions.alwaysFalse()) { ManifestEvaluator manifestEvaluator = ManifestEvaluator.forRowFilter( deleteExpression, specsById.get(manifest.partitionSpecId()), caseSensitive); - canContainExpressionDeletes = manifestEvaluator.eval(manifest); - } else { - canContainExpressionDeletes = false; + return manifestEvaluator.eval(manifest); } - boolean canContainDroppedPartitions; + return false; + } + + private boolean canContainDroppedPartitions(ManifestFile manifest) { if (!dropPartitions.isEmpty()) { - canContainDroppedPartitions = - ManifestFileUtil.canContainAny(manifest, dropPartitions, specsById); - } else { - canContainDroppedPartitions = false; + return ManifestFileUtil.canContainAny(manifest, dropPartitions, specsById); } - boolean canContainDroppedFiles; + return false; + } + + private boolean canContainDroppedFiles(ManifestFile manifest) { if (!deletePaths.isEmpty()) { - canContainDroppedFiles = true; - } else if (!deleteFiles.isEmpty()) { - // because there were no path-only deletes, the set of deleted file partitions is valid - canContainDroppedFiles = - ManifestFileUtil.canContainAny(manifest, deleteFilePartitions, specsById); + return true; } else { - canContainDroppedFiles = false; + return ManifestFileUtil.canContainAny(manifest, deleteFilePartitions, specsById); } - - boolean canContainDropBySeq = - manifest.content() == ManifestContent.DELETES - && manifest.minSequenceNumber() < minSequenceNumber; Review Comment: We used to rewrite a given manifest anytime it had aged deletes regardless of if the manifest needed to get rewritten for other purposes. Discussed with @rdblue and the way to look at this is that we should opportunistically remove the aged deletes when the manifest needs to get rewritten for other purposes. Otherwise we don't really need to go out of our way since it can create unnecessary failures which I think I agree with the rationale on that. I removed `canContainDropSeq` as an individual condition to drive a rewrite, which is technically a behavior change but arguably a better choice based on the argument that it avoids unnecessary failures that can happen by doing that work. I also updated TestRowDelta#testDeleteDataFileWithDeleteFile so that we still exercise the case where aged deletes do get rewritten, but the conditions for triggering that had to change in the tests. ########## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ########## @@ -162,11 +178,13 @@ void delete(F file) { void delete(CharSequence path) { Preconditions.checkNotNull(path, "Cannot delete file path: null"); invalidateFilteredCache(); + this.allDeletesReferenceManifests = false; deletePaths.add(path); } boolean containsDeletes() { - return !deletePaths.isEmpty() + return !manifestsReferencedForDeletes.isEmpty() Review Comment: I've removed this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org