ricardopereira33 opened a new issue, #12642: URL: https://github.com/apache/iceberg/issues/12642
### Proposed Change ## Problem We found an issue when expiring old snapshots from a table with a lot of snapshots (+10k). The issue happens when the `expireSnapshots` action triggers a request with a list of `UpdateTableRequest` that will clean up most of the snapshots (~99%). The Rest Catalog server will receive a single request (UpdateTableRequest), with the list of snapshots to be removed from the metadata. However, the metadata updates on each snapshot: https://github.com/apache/iceberg/blob/03ff41c189c7420992be0e4a4ddc63f005e2e0d5/core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java#L435 The `applyTo` method will basically just update the Table for 1 snapshot, even though it could receive the entire list of snapshots: https://github.com/apache/iceberg/blob/03ff41c189c7420992be0e4a4ddc63f005e2e0d5/core/src/main/java/org/apache/iceberg/MetadataUpdate.java#L344 The `removeSnapshots` method delegates the process to the `rewriteSnapshotInternal`: https://github.com/apache/iceberg/blob/03ff41c189c7420992be0e4a4ddc63f005e2e0d5/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1424 This `rewriteSnapshotInternal` iterates over all snapshots of the table to remove the provided list of snapshots (as I mentioned, **it only passes 1 snapshot**). This is not efficient, when we need to remove a huge amount of snapshots, we need to iterate over the entire list of snapshots (N elements), N times - O(Nˆ2). We notice this issue when we recently enable some tables that are written by streaming jobs (they are often written and generate a lot of snapshots). With +10 tables, having +10k snapshots each, some of them +100k, cause our Rest Catalog server to hit 100% CPU usage constantly:  ## Proposal On the `CatalogHandlers`, we could group by the `MetadataUpdate`, and apply them in bulk: https://github.com/apache/iceberg/blob/03ff41c189c7420992be0e4a4ddc63f005e2e0d5/core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java#L435 ### Proposal document _No response_ ### Specifications - [x] Table - [ ] View - [x] REST - [ ] Puffin - [ ] Encryption - [ ] Other -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org