grantatspothero commented on PR #16207:
URL: https://github.com/apache/iceberg/pull/16207#issuecomment-4373457460

   Our problem was excessive memory usage due to caching TableMetadata on the 
client side.
   
   Storing a `List<HistoryEntry>` in memory is fine for small numbers of 
snapshots, but each entry takes ~32 bytes and this grows quickly when you have 
a single coordinator service caching iceberg metadata in memory. 
   
   Example:
   - 1000 table metadata cached in memory
   - each table commits every 30s, with 30 days of snapshot retention = 
2*60*24*30 ~100K snapshots in iceberg metadata
   - 32 bytes * 100K = 3.2 MB per table
   - 3.2MB/table * 100 tables = 32GB
   
   Note: this is "resident set size" not "total allocations" which tends to be 
significantly higher due to intermediate allocations of parsing JSON.
   
   For multi-tenant coordinator services (eg: commit services, cache services) 
this memory usage is a problem. The biggest memory hog is by far the snapshots 
array, but snapshotLog is the next biggest. Since we already defer snapshots, 
it seemed reasonable to defer snapshotLog. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to