Fokko commented on code in PR #10246:
URL: https://github.com/apache/iceberg/pull/10246#discussion_r1584982185

##########
core/src/main/java/org/apache/iceberg/FastAppend.java:
##########
@@ -156,6 +156,8 @@ public List<ManifestFile> apply(TableMetadata base, 
Snapshot snapshot) {
       manifests.addAll(snapshot.allManifests(ops.io()));
     }
 
+    manifests.forEach(summaryBuilder::addedManifestStats);

Review Comment:
   This is also my main question. 
   
   My train of thought: You will need to read the manifest-list in any 
situation. The number of manifest can vary widely:
   
   - If FastAppends are used frequently, there will be many small manifests 
that you want to bundle into batches.
   - If MergeAppends are used, the manifests are rather hefty (8 megabytes by 
default, set using `commit.manifest.target-size-bytes`).
   
   With the knowledge from the summary, you could spin up executors before 
reading the manifest-list, but this can be difficult since you would also need 
to know the sizes of the manifest to do some effective planning.
   
   The downside is that we add extra information to the metadata-JSON, which 
can also grow in size when there are many commits. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to