deniskuzZ commented on PR #11669:
URL: https://github.com/apache/iceberg/pull/11669#issuecomment-2505896721

   @pvary, unfortunately, that won't work. I was looking for an easy way to get 
basic partition stats, however, I missed the part that iceberg only keeps the 
changed partitions in a SnapshotSummary. Aggregation with just the prev 
snapshot value is not enough, it requires loop through all the snapshots.
   
   ````
   table.newFastAppend().appendFile(FILE_A).commit();
   partitions.data_bucket=0 -> 
added-data-files=1,added-records=1,added-files-size=10,total-records=3,total-files-size=30,total-data-files=3,total-delete-files=0,total-position-deletes=0,total-equality-deletes=0
   
   table.newFastAppend().appendFile(FILE_B).commit();
   partitions.data_bucket=1 -> 
added-data-files=1,added-records=1,added-files-size=10,total-records=2,total-files-size=20,total-data-files=2,total-delete-files=0,total-position-deletes=0,total-equality-deletes=0
   
   table.newFastAppend().appendFile(FILE_A).commit();
   partitions.data_bucket=0 -> 
added-data-files=1,added-records=1,added-files-size=10,total-records=3,total-files-size=30,total-data-files=3,total-delete-files=0,total-position-deletes=0,total-equality-deletes=0
   ````
   
   do you think it's worth doing it in SnapshotSummary or is there some 
simpler/better way like create or update the partition stats puffin file right 
after the commit?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to