Re: [PR] Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit [iceberg]

via GitHub Thu, 25 Jul 2024 14:24:21 -0700


grantatspothero commented on PR #10523:
URL: https://github.com/apache/iceberg/pull/10523#issuecomment-2251418600


   > A question:
   > 
   > I'm not exactly sure how the "table metadata READ" and "manifest list 
READ" translate into calls to the underlying object store. Does "manifest list 
READ" result in a request to list all objects matching a particular prefix? And 
if so, could having many manifest files result in this operation becoming slow? 
In that case, I assume that rewriting manifests could help such a situation?
   > 
   > If not, what makes those operations slow? And how slow - approximately - 
are they, in absolute terms?
   
   "table metadata READ" and "manifest list READ" are both single S3 GETs. so 2 
extra network requests that are not needed to actually perform the commit.
   
   Regarding how that affects total runtime, see the PR description:
   ```
   We are ingesting streaming data using a java service that does iceberg 
FastAppend
   We noticed about ~20% (YMMV) of the fastappend commit time for our usecase 
is spent on nonrequired cleanup operations, specifically this bit which 
FastAppend inherits from SnapshotProducer:
   ```
   The extra network requests are definitely noticeable for fast appends of 
small files. For our usecase, the iceberg metadata files are large because 
there are lots of unexpired snapshots so fetching a large metadata file from s3 
is slow and that exacerbates the problem. 
   
   But if you have small metadata files and are not using FastAppend then you 
probably do not care much about this optimization. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit [iceberg]

Reply via email to