Re: [PR] Add ManifestFile Stats in snapshot summary. [iceberg]

via GitHub Thu, 02 May 2024 16:00:41 -0700


amogh-jahagirdar commented on code in PR #10246:
URL: https://github.com/apache/iceberg/pull/10246#discussion_r1588511039



##########
core/src/main/java/org/apache/iceberg/FastAppend.java:
##########
@@ -156,6 +156,8 @@ public List<ManifestFile> apply(TableMetadata base, 
Snapshot snapshot) {
       manifests.addAll(snapshot.allManifests(ops.io()));
     }
 
+    manifests.forEach(summaryBuilder::addedManifestStats);

Review Comment:
   @nk1506 I think the main question (at least from me) is essentially: 
   
   Is information on just the total number of data or delete manifests actually 
helpful for being able to help with whatever planning estimation you're trying 
to do? 
   
   As @Fokko said you'll probably want to know manifest sizes (in bytes) of the 
files involved in planning since there can be variance depending on the 
strategy used for the append. I think with just determining based on number of 
files (and not including sizes, which means you have to read the manifest 
anyways) there would be a lot of overprovisioning and underprovisioning (if 
you're trying to a distributed planning).
    
   But definitely open to hearing more, and especially if you are able to share 
any data points on how this metric helped ; that would give us some more 
confidence that this is useful and can generalize.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add ManifestFile Stats in snapshot summary. [iceberg]

Reply via email to