aokolnychyi commented on code in PR #9447:
URL: https://github.com/apache/iceberg/pull/9447#discussion_r1457935454
##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteManifestsProcedure.java:
##########
@@ -118,4 +126,15 @@ private InternalRow[] toOutputRows(RewriteManifests.Result
result) {
public String description() {
return "RewriteManifestsProcedure";
}
+
+ private RewriteManifestsSparkAction checkAndApplyFilter(
+ RewriteManifestsSparkAction action, String where, Identifier ident) {
+ if (where != null) {
+ Expression expression = filterExpression(ident, where);
Review Comment:
The reason why we did not add it in the first place is because we persist
only lower/upper boundaries for manifest partition values. Therefore, filters
may not necessarily benefit from that. Based on what I've seen, these
boundaries are rarely helpful when the manifest compaction is needed. Also, we
have done some improvements to how the manifests are rewritten (not released
yet). From what I see, it is a matter of a few minutes to rewrite all manifests
even for large tables because the operation is distributed.
@bknbkn, did you have a chance to try the improvements for the action in
master? How long does the rewrite take in your tables if you target all
manifests?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]