bknbkn commented on code in PR #9447: URL: https://github.com/apache/iceberg/pull/9447#discussion_r1458505983
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteManifestsProcedure.java: ########## @@ -118,4 +126,15 @@ private InternalRow[] toOutputRows(RewriteManifests.Result result) { public String description() { return "RewriteManifestsProcedure"; } + + private RewriteManifestsSparkAction checkAndApplyFilter( + RewriteManifestsSparkAction action, String where, Identifier ident) { + if (where != null) { + Expression expression = filterExpression(ident, where); Review Comment: Thanks for your review @aokolnychyi , In our scenario, `Maintenance Action` more often exists as a periodic task, we will automatically call `Maintenance Action` regularly to maintain many existing tables in the warehouse, most of tables are time-partitioned tables. Adding filter brings the following benefits: - **Reduce unnecessary manifests IO in the cluster** Within an action execution interval, only a part of the partition data of each table will be updated (in our scenario about 10% of partitions will be affected, the manifests contained remaining partitions have been aggregated when the previous action was executed, and no need to re-aggregate). For a single table, rewrite all manifests will not take more than **10 minutes** in a distributed situation, but we still hope to have the opportunity to reduce the overall pressure on the cluster. - **Conflict avoidance** When too many small files are written to the upstream of the Iceberg table at one time, it may trigger manfests merging(By ManifestMergeManager.mergeGroup). This behavior may conflict with the rewrite manifests action. Facing streaming tasks, we cannot stop it, can only be solved by retrying the action. But filter out the partition the task is currently writing can reduce the occurrence of conflicts. **By the way,** In V1 table with snapshot-id inherit=true, it still need write all manifests in driver when execute copyManifests function, I find that this propertity is no longer affects V2 table, Is any plan to remove or default true this propertity in the V1 table, or is there still any unsafe? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org