amogh-jahagirdar commented on code in PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#discussion_r1815717333
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -323,11 +345,15 @@ private ManifestFile filterManifest(Schema tableSchema,
ManifestFile manifest) {
PartitionSpec spec = reader.spec();
PartitionAndMetricsEvaluator evaluator =
new PartitionAndMetricsEvaluator(tableSchema, spec,
deleteExpression);
+ boolean hasDeletedFiles =
manifestsReferencedForDeletes.contains(manifest.path());
+ if (hasDeletedFiles) {
+ return filterManifestWithDeletedFiles(evaluator, manifest, reader);
+ }
// this assumes that the manifest doesn't have files to remove and
streams through the
// manifest without copying data. if a manifest does have a file to
remove, this will break
// out of the loop and move on to filtering the manifest.
- boolean hasDeletedFiles = manifestHasDeletedFiles(evaluator, reader);
+ hasDeletedFiles = manifestHasDeletedFiles(evaluator, reader);
Review Comment:
Fixed!
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -341,43 +375,36 @@ private ManifestFile filterManifest(Schema tableSchema,
ManifestFile manifest) {
}
private boolean canContainDeletedFiles(ManifestFile manifest) {
- boolean canContainExpressionDeletes;
+ return canContainDroppedFiles(manifest)
+ || canContainExpressionDeletes(manifest)
+ || canContainDroppedPartitions(manifest);
+ }
+
+ private boolean canContainExpressionDeletes(ManifestFile manifest) {
if (deleteExpression != null && deleteExpression !=
Expressions.alwaysFalse()) {
ManifestEvaluator manifestEvaluator =
ManifestEvaluator.forRowFilter(
deleteExpression, specsById.get(manifest.partitionSpecId()),
caseSensitive);
- canContainExpressionDeletes = manifestEvaluator.eval(manifest);
- } else {
- canContainExpressionDeletes = false;
+ return manifestEvaluator.eval(manifest);
}
- boolean canContainDroppedPartitions;
+ return false;
+ }
+
+ private boolean canContainDroppedPartitions(ManifestFile manifest) {
if (!dropPartitions.isEmpty()) {
- canContainDroppedPartitions =
- ManifestFileUtil.canContainAny(manifest, dropPartitions, specsById);
- } else {
- canContainDroppedPartitions = false;
+ return ManifestFileUtil.canContainAny(manifest, dropPartitions,
specsById);
}
- boolean canContainDroppedFiles;
+ return false;
+ }
+
+ private boolean canContainDroppedFiles(ManifestFile manifest) {
if (!deletePaths.isEmpty()) {
- canContainDroppedFiles = true;
- } else if (!deleteFiles.isEmpty()) {
- // because there were no path-only deletes, the set of deleted file
partitions is valid
- canContainDroppedFiles =
- ManifestFileUtil.canContainAny(manifest, deleteFilePartitions,
specsById);
+ return true;
} else {
- canContainDroppedFiles = false;
+ return ManifestFileUtil.canContainAny(manifest, deleteFilePartitions,
specsById);
}
-
- boolean canContainDropBySeq =
- manifest.content() == ManifestContent.DELETES
- && manifest.minSequenceNumber() < minSequenceNumber;
Review Comment:
We used to rewrite a given manifest anytime it had aged deletes regardless
of if the manifest needed to get rewritten for other purposes. Discussed with
@rdblue and the way to look at this is that we should opportunistically remove
the aged deletes when the manifest needs to get rewritten for other purposes.
Otherwise we don't really need to go out of our way since it can create
unnecessary failures which I think I agree with the rationale on that. I
removed `canContainDropSeq` as an individual condition to drive a rewrite,
which is technically a behavior change but arguably a better choice based on
the argument that it avoids unnecessary failures that can happen by doing that
work.
I also updated TestRowDelta#testDeleteDataFileWithDeleteFile so that we
still exercise the case where aged deletes do get rewritten, but the conditions
for triggering that had to change in the tests.
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -162,11 +178,13 @@ void delete(F file) {
void delete(CharSequence path) {
Preconditions.checkNotNull(path, "Cannot delete file path: null");
invalidateFilteredCache();
+ this.allDeletesReferenceManifests = false;
deletePaths.add(path);
}
boolean containsDeletes() {
- return !deletePaths.isEmpty()
+ return !manifestsReferencedForDeletes.isEmpty()
Review Comment:
I've removed this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]