nk1506 commented on code in PR #10246:
URL: https://github.com/apache/iceberg/pull/10246#discussion_r1585139953


##########
core/src/main/java/org/apache/iceberg/FastAppend.java:
##########
@@ -156,6 +156,8 @@ public List<ManifestFile> apply(TableMetadata base, 
Snapshot snapshot) {
       manifests.addAll(snapshot.allManifests(ops.io()));
     }
 
+    manifests.forEach(summaryBuilder::addedManifestStats);

Review Comment:
   Thanks @Fokko for feedback. 
   
   > With the knowledge from the summary, you could spin up executors before 
reading the manifest-list, but this can be difficult since you would also need 
to know the sizes of the manifest to do some effective planning.
   
   _With manifest stats, It can help with better planning in terms of manifest 
file scans. Without these stats in metadata json either someone will have to 
assume or do one more IO with manifestList._ 
   
   > The downside is that we add extra information to the metadata-JSON, which 
can also grow in size when there are many commits.
   
   _I agree it adds some extra bytes to metadataJson. But I think here we have 
options to optimize with expire snapshots. But to avoid an extra IO for 
planning I dont find any other way._ 
   
   WDYT ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to