Declow commented on issue #2325:
URL: 
https://github.com/apache/iceberg-python/issues/2325#issuecomment-3625428629

   @andormarkus 
   
   I ended up disabling the cache completely as well.
   
   >This approach gives me:
   During execution: Full benefit of the cache (faster performance)
   Between invocations: Memory growth is limited/bounded
   
   It depends on your use-case if this is really true.
   My use-case is I consume a stream of events and append the data to iceberg.
   The way that the cache is right now does not give additional performance.
   When data is appended to iceberg it creates a key in the cache like so.
   [key1] -> list of manifests [manifest1]
   A second append 
   [key2] -> list of manifests [manifest1, manifest2]
   
   The first key is never activated and the cache is just updated with a new 
key. A lot of the same manifest files will just be referenced multiple times in 
the cache. This is why the overall memory increases so rapidly.
   So if your use-case is just writing data to iceberg then you might as well 
disable the cache.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to