Declow commented on issue #2325: URL: https://github.com/apache/iceberg-python/issues/2325#issuecomment-3625428629
@andormarkus I ended up disabling the cache completely as well. >This approach gives me: During execution: Full benefit of the cache (faster performance) Between invocations: Memory growth is limited/bounded It depends on your use-case if this is really true. My use-case is I consume a stream of events and append the data to iceberg. The way that the cache is right now does not give additional performance. When data is appended to iceberg it creates a key in the cache like so. [key1] -> list of manifests [manifest1] A second append [key2] -> list of manifests [manifest1, manifest2] The first key is never activated and the cache is just updated with a new key. A lot of the same manifest files will just be referenced multiple times in the cache. This is why the overall memory increases so rapidly. So if your use-case is just writing data to iceberg then you might as well disable the cache. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
