grantatspothero commented on PR #10523: URL: https://github.com/apache/iceberg/pull/10523#issuecomment-2251418600
> A question: > > I'm not exactly sure how the "table metadata READ" and "manifest list READ" translate into calls to the underlying object store. Does "manifest list READ" result in a request to list all objects matching a particular prefix? And if so, could having many manifest files result in this operation becoming slow? In that case, I assume that rewriting manifests could help such a situation? > > If not, what makes those operations slow? And how slow - approximately - are they, in absolute terms? "table metadata READ" and "manifest list READ" are both single S3 GETs. so 2 extra network requests that are not needed to actually perform the commit. Regarding how that affects total runtime, see the PR description: ``` We are ingesting streaming data using a java service that does iceberg FastAppend We noticed about ~20% (YMMV) of the fastappend commit time for our usecase is spent on nonrequired cleanup operations, specifically this bit which FastAppend inherits from SnapshotProducer: ``` The extra network requests are definitely noticeable for fast appends of small files. For our usecase, the iceberg metadata files are large because there are lots of unexpired snapshots so fetching a large metadata file from s3 is slow and that exacerbates the problem. But if you have small metadata files and are not using FastAppend then you probably do not care much about this optimization. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org